Nvidia, the world’s largest AI chip manufacturer, just made waves by announcing a new cutting-edge superchip as well as a family of models that promise to revolutionize how AI interacts with our physical world. On this episode of Pioneers of AI, Kari Ann Briski, Vice President of AI Models, Software and Services at Nvidia, sits down with us to explore the company’s full-stack approach to AI. Briski shares the company’s journey from accelerated computing to generative AI, the rise of sovereign AI, and how they are helping start-ups and enterprises with building the infrastructure of the future.
About Kari
- VP of AI Models, Software & Services at NVIDIA; leads AI software strategy & roadmaps
- 20+ years in software development; 15+ years leading tech product management
- Helped shape NVIDIA's generative AI, inference microservices, and sovereign AI efforts
- Built enterprise AI partnerships with leaders like AWS, Meta, and IBM
- Computer engineering degree from the University of Pittsburgh
Table of Contents:
- Why NVIDIA built beyond chips into a full AI stack
- Why inference now matters as much as training
- What product leadership looks like inside NVIDIA
- How sovereign AI helps nations build on their own terms
- Why AI agents need many models working together
- When smaller language models beat bigger ones
- How NVIDIA is widening access for startups and enterprises
- Why energy efficiency is becoming central to AI infrastructure
- Episode Takeaways
Transcript:
From superchips to sovereign AI, with Kari Ann Briski
RANA EL KALIOUBY: NVIDIA – the largest AI chip maker in the world – announced some pretty big news to kick off 2025.
At CES – the huge annual tech convention in Las Vegas, CEO Jensen Huang debuted the Grace Blackwell Superchip. He also announced a family of new foundational AI models, called Cosmos.
On this episode we’re going to talk about a different side of NVIDIA’s business, but before we get into that, I want to take a moment to talk about how important these new technologies are.
First, let’s take the superchip. NVIDIA is using this chip to power a new kind of PC – an AI personal supercomputer – one that can fit on your desk. This computer can run large language models up to 200 billion parameters in size! Your average PC definitely can’t train or run these large models locally. For context, widely used LLMs range from 10s of billions of parameters to 100s of billions of parameters, and depending on the exact model size and how complex your prompt is, you may need multiple high-end GPU servers to run these models.
And no, the average person won’t need this kind of compute power – yet! But you know who will? AI developers who are building new projects. Or folks who’re researching algorithmic innovations.
Then there’s Cosmos, the family of new foundational models. This is the product I’m most excited about. We’ve all seen a lot of large language models which can generate text, image, and even video. But Cosmos is something different.
It’s trained on real footage of human movement and will be used to power embodied AI like humanoid robots and industrial robots. Of course a home robot, like Rosie from the Jetson’s, would be awesome. I personally can’t wait for a robot to do my laundry!
But there are so many other applications for this model. Think about its impact in manufacturing, infrastructure and even personal health. So imagine an AI powered wearable device can recommend the perfect diet for your needs. Or an autonomous robot that can repair an underwater tunnel.
Going into 2025, one of my top predictions is that we’ll see a rise in embodied AI. 2024 was undoubtedly the year of chatbots – but we live in a physical world. Embodied AI in the form of robots and wearable technology will augment our human abilities – both physically and mentally.
As you might already know, NVIDIA is currently the second most valuable company in the world, only slightly lagging behind Apple.
The company’s AI chips get so much attention. But actually NVIDIA is a full stack company – which means that they offer a suite of AI infrastructure products and tools that allow companies to design, build, and deploy AI systems. That means everything from the hardware – AKA the chips – to AI-powered software.
I got the chance to sit down with Kari Ann Briski, who oversees generative AI at NVIDIA.
On our episode today, Kari is giving us a look under the hood at NVIDIA’s software business. We’ll talk about everything from the competitive advantage of running a full stack company to the rise of sovereign AI, as nations around the world create their own AI foundational models.
As a heads up, we get into some weedy territory. But this is a great listen for every level of technical knowledge.
I’m Rana el Kaliouby and this is Pioneers of AI — A podcast taking you behind the scenes of the AI revolution.
[THEME MUSIC]
Before getting into Kari’s work at NVIDIA, I wanted to know more about her background, and how she uses AI in her daily life.
EL KALIOUBY: So before we get into talking NVIDIA, I am always curious about how people use AI in their day to day lives. So do you use AI at all outside of work?
KARI ANN BRISKI: Yes, and I wish I could use it more. I’ll say that. I think that, being in the industry and seeing the power of AI and where things are going, you almost get a little impatient in some day to day situations. You’re like, oh man, I know this is going to be great in a year or two. I think about just doctor’s visits with my kids, or I just started talking in my car the other day and then I had to giggle because I wasn’t listening and I forgot I actually had to touch the wheel to make a command. So yeah.
EL KALIOUBY: That is actually funny you bring that up because in my former life I was a CEO of Affectiva and we did a lot of work for the automotive industry. And our whole vision was this kind of in-car companion with emotional intelligence that you can kind of talk to about planning dinner tonight. We’re not quite there yet, but wouldn’t it be cool? So you and I are both passionate about getting more women in tech. I’m curious. What was your path to tech?
BRISKI: I think with some people you just know, right? I was always the single girl in the basement of high school taking drafting classes. And when I went to college, computer engineering was a new dual degree where you could do electrical engineering and computer science, and by the end of my freshman year, I was like, that’s what I want to do. And I still was only one of five women that graduated in computer engineering in 2000. And it’s funny because I actually used to help coach at local high schools and one of the girls I coached, she went to computer engineering and she called me and she’s like, there’s only five girls in my class. Why is that? So yeah, we have to keep pushing.
Copy LinkWhy NVIDIA built beyond chips into a full AI stack
EL KALIOUBY: We can totally do a whole episode on just how to get more women in AI, right? There’s a lot more work we need to do there. All right, so most people know NVIDIA for their AI chips work, right? The kind of hardware microchips or the graphics processing units on which we run AI models, both for training and for inference. But what most people don’t realize is that NVIDIA is actually way more than that, and it’s a full stack company, and I kind of want to take a moment just to explain to our listeners what we mean by full tech stack. Especially in an AI tech stack, we mean everything from the hardware layer all the way to the application layer and everything in between. So that’s the infrastructure, right? The model training, the model inference, the model validation, the deployment tools, the applications, and I think a lot of people don’t realize that.
So why did NVIDIA make this choice or this decision to be a one stop shop for all things AI.
BRISKI: Wow, yeah. I don’t know if it’s a choice, but it was a passion maybe of, you know, you want to achieve your dreams and what’s the whole purpose, not the what are we building, but why are we building it? But I think if I were to take a step back and talking about full stack, Jensen always talks about there are two simultaneous platform transitions in computing and the first transition was from general purpose computing to accelerated computing. So accelerated computing not only speeds up applications, but also reduces costs and energy consumption. So this allowed new types of applications to be explored.
EL KALIOUBY: The accelerated computing powered the gaming industry to start with, right? And it’s this idea that you can do a lot of parallel computations on the same chip.
BRISKI: And so that’s kind of the first paradigm shift. And so the second one was enabled by accelerated computing, so the development and deployment of software in a new way. And that’s what you were mentioning with generative AI.
And so that’s about massive AI models that can understand and learn from vast amounts of data. There’s data processing and fine tuning and evaluating and optimizing and guardrailing of all these models.
EL KALIOUBY: So to recap, these two shifts in computing – The rise of accelerated computing and the generative AI explosion – are changing the tech landscape. Accelerated computing was initially critical for the gaming industry and also in scientific work where large amounts of data needed to be crunched.
But now with the rise of AI, it’s not just the gaming and research industries that need accelerated computing. Every industry does.
And NVIDIA, as a full stack company, is well prepared to provide that infrastructure.
BRISKI: So if you think about it, every back office enterprise software tool that never needed a GPU before now needs a GPU. Accelerated computing because it’s going to be running generative AI as part of its software suite.
Copy LinkWhy inference now matters as much as training
EL KALIOUBY: I think you’re also implying that you absolutely need accelerated computing if you are building these models, but it sounds like you also would need them if you’re deploying these models for inference as well.
Inference – this is the stage that comes after you’ve trained your models. It’s when the AI model is drawing real time conclusions or answering questions or synthesizing information. For example, everytime you prompt ChatGPT with a question – that’s inference at work.
We need accelerated computing not only for training but also for inference. And Kari says that there are two reasons why.
BRISKI: Even though we’re doing a really great job at distilling knowledge and getting large knowledge packed into smaller models, the most accurate models are still the really large models. And we definitely see that the larger the models, the more data, the more accurate they are.
So that’s one. And I think what people sometimes don’t understand is that people think you ask a question and you hit one model and it comes back with the answer, and that’s absolutely not what’s happening. Even today, it’s not just a model, it’s a full system.
And you’re hitting at least a dozen LLMs, large language models, to get an answer. And I think in the future when we have agents and agentic AI, you’re going to have agents working on your behalf, talking to other systems and agents. And so there’s going to be dozens of models that need to work. And so that latency is really going to need accelerated computing.
EL KALIOUBY: I love that. And we’re going to come back to the agentic AI conversation, because that’s super important. But yeah, you’re right, it’s not just a call to OpenAI ChatGPT, right? It’s often a combination of a whole set of models playing together via APIs.
BRISKI: Yeah, you could have large language model routers, because you have an LLM that’s fine tuned for maybe your finance system and an LLM that’s fine tuned for your supply chain management system. And then that all gets coalesced together and then generated for an answer. And so, yeah, you’re going to have a lot of them working for you.
Copy LinkWhat product leadership looks like inside NVIDIA
EL KALIOUBY: So you head up generative AI products and product management at NVIDIA and you’ve been at NVIDIA for the last eight years. Can you kind of explain what your role entails, both to a tech savvy audience, but perhaps also to folks who are not deeply immersed in the tech industry.
BRISKI: Yeah, I think as a product manager, you have to know the why. Right. So why do we need to have reduced precision, right? Why do we need sparsity? Why do we need these features inside our hardware and systems? Because we’re trying to build the next speech synthesis or a large thing, a diffusion models. And so we build our own models so that we can kind of push the limits of our hardware systems as well.
And so knowing the why, and so having that roadmap of what we’re building, why we’re building it, positioning it in the market. At NVIDIA, we really love our partners, and we want our partners to succeed, too.
EL KALIOUBY: And NVIDIA has a lot of partners. Some of them are the biggest names in tech like, Meta, Amazon Web Services, and IBM.
BRISKI: And so a partnership is when you both have something to provide, but also commitment to each other to help each other succeed, right? And so knowing your position in the market, how to position with your partners, what’s your value that you bring, right?
And not just partner with me because I’m NVIDIA, but because I’m bringing value to you and you’re going to provide value to me. And so I think that’s part of the role of the product manager as well. I call my product managers T people. So they have to go broad, but they have to be able to go really deep very quickly. Because you’re interacting with engineering architects all the time, and so you have to be able to go toe to toe, you have to be able to say, hmm, that doesn’t make sense, can we talk about this, and let’s maybe change some things.
EL KALIOUBY: I am 100 percent going to use that “T people” example. Genius! But what do these partnerships actually look like and why do they matter to you? Think BIG, like changing the infrastructure of an entire country kind of big.
We’ll get to that after a short break.
[AD BREAK]
Copy LinkHow sovereign AI helps nations build on their own terms
So today, most of the models we recognize are U. S. based, like OpenAI’s ChatGPT, or Google’s Gemini, or Anthropic’s Claude. But of course, nations around the world want to invest in what we call sovereign AI, or AI that is kind of built in house, right? They want to build their own infrastructure, and leverage their own data, and also hire their own workforce. It sounds like this is an area NVIDIA is really kind of doubling down on. Can you say more about how NVIDIA is helping nations build their own versions of AI? And what does that look like?
BRISKI: Yeah, I think what’s interesting is that a couple years ago, nations were some of the first movers on understanding the value of AI infrastructure, because they knew that they needed models in their language, preserving their national culture, colloquialisms, and it’s just better for them to stand on their own two feet.
AI Sweden was one of the first users actually of building their own foundational model.
EL KALIOUBY: How interesting, I did not know that.
BRISKI: Yeah. So I think the ability to sort of stand on your own two feet with AI is always a good thing, not just for your nation, but also for jobs in your nation, for people to understand how to use AI in your nation. And then you’re using a model that understands you and your culture.
And that would be an example of, say, an AI product or an AI software product that’s on your roadmap where you’re kind of building these tools to support these nations.
Yeah, so what we do is our products support the building of these models, right? So think about the architectural infrastructure — and we call this a foundry, actually — because we bring the pillars and the expertise of either community models or our own models, the tools to be able to do it, the algorithms, and the ability to optimize, and then the compute necessary. And then they bring the data and the model is theirs.
EL KALIOUBY: Now, of course, this isn’t just at the nation’s level, but at the enterprise level too. So are there any cool examples of how you’re helping enterprises build and deploy AI?
BRISKI: Enterprises, maybe a year ago, were really dabbling with maybe operational efficiency or productivity as the first use cases. Especially for customer service. If you might have an account of a customer who’s called 20 times and you have an agent that gets on the phone, you can quickly summarize what’s happening. So I think summarization was a real easy one for enterprises to go to.
And now, really everyone’s sort of diving into virtual assistants and agents. And so being able to retrieve information and then be able to generate answers. And so basically it’s that sort of search or deep recommender system for your own company that’s connected to all your important systems and applying generative AI to that.
Copy LinkWhy AI agents need many models working together
EL KALIOUBY: Yeah. So let’s actually double click on the agentic AI. An AI agent is basically an AI that can act on your behalf, whether you’re an individual or an organization, and it sounds like this is something NVIDIA’s investing in. So can you tell us more?
BRISKI: Once you’ve trained or fine tuned a model in your domain, or a task, you want to go put it to work. Our concept — we call them NIMs. Inference is one word, but we call it NVIDIA Inference Microservices — NIMs is a lot easier to say.
And I mentioned to you earlier, it’s not just one model, it’s multiple models working together. And so how do they work in concert? How does an agent reason about a task? I think saying AI has become pretty numb to people. But I think when you really think about the types of models and the evolution of AI, I’m just going to kind of backtrack for a second and geek out — like first it was around computer vision. And I do remember someone saying, well, it’s all solved, ImageNet happened, we’re done. I’m like, well, we’re just at the tip of the iceberg, right? For artificial intelligence, what is intelligence? You have to not only see a vision, but you have to listen, you have to be actively listening, you have to internalize and understand. And if you’re given a task, you have to make a plan and then go do that task and then check on it and be iterative about it.
And then you have to be able to speak and have speech synthesis, right? So all of these different models, being able to put them together — and now we’re at that time, right? So these agents go off and they need to understand the prompt or the question or the task that was given them.
They need to make a plan. They have to execute on that plan. And when the answers start to come back, they need to say, does this make sense? So I need to go back again and iterate until it does make sense.
EL KALIOUBY: You’re also underscoring something very important here, which is that AI should be and is fundamentally multimodal, right? To your point, it’s bringing all these different channels of information and data, just the way humans do, like with our vision, with our listening, with our kind of interpretation and intention modeling, and all of that.
One area I’ve spent a lot of time building is emotional intelligence and building empathy into these interfaces and these technologies. And kind of back to the car example, you kind of want your car to see that you’re okay, you’re a little stressed driving the kids to soccer practice. Yeah. Say more.
BRISKI: So like even, you want your agent or AI to be able to adapt — adaptive and situational, right? So I might have a different voice that I’m talking with you right now than I am in the car screaming at my kids to get their shoes on and get in the car in the morning. Or I might have a different voice if I’m in a serious situation or if you’re just out at a bank or in a retail shop ordering boba tea or whatever the kids do these days, right? But you definitely want it to be able to adapt.
Copy LinkWhen smaller language models beat bigger ones
EL KALIOUBY: Yeah, absolutely. Now this also brings me to another question, which is there’s a lot of interest in these language models being larger and larger, but NVIDIA released a small language model which has accuracy comparable to these large language models, or LLMs. Can you say more and why is that important?
Why do we need these small language models?
BRISKI: Well, I think it’s dependent on the deployment scenario. Maybe you’re on an edge device, or you’re in a situation with more classification type scenarios, or you’ve been able to fine tune a smaller model for a certain situation, or you’re able to distill a larger model down to a smaller model to get, you know, 90 percent of the accuracy but 3 times the throughput. People are interested in that, A, for the inference footprint cost of the deployment scenario, but also because that situation might be OK with a loss of accuracy.
But that’s where I was talking about LLM routers — where if you have a situation where you absolutely need accuracy, you want to route to the extremely large model that has the highest accuracy.
EL KALIOUBY: Yeah, I want to go back to the automotive example, and I love that you started with that, because I think we can kind of apply a lot of our discussion to the car situation. So if you’re using large language models in the cloud, and say you’re building an in cabin sensing solution that detects driver drowsiness or driver distraction, if you’re relying on the cloud, you now have to stream all the data to the cloud all the time.
And that is expensive. There’s latency, like it can slow down the results. But also there’s privacy concerns. You may not want your video data in the car being streamed to the cloud. And this is where—
BRISKI: True for enterprises, right? You know what’s happening with your data and where it’s going.
EL KALIOUBY: And you’re owning it and you can run it on the edge, like you said. So all the processing happens locally and you don’t need to send your data anywhere, which to your point, a lot of enterprises will want to prioritize.
BRISKI: That’s right.
Copy LinkHow NVIDIA is widening access for startups and enterprises
EL KALIOUBY: There is so much more I want to ask Kari about her work at NVIDIA. We’re going to take a short break.
[AD BREAK]
EL KALIOUBY: But more of our conversation in a minute.
With my investor hat on I do worry that big tech are hogging all the AI chips, and that stifles innovation, both at the startup level — when I have my investor hat on, I want to make sure that startups have access to the latest and greatest chips to build their AIs — but also at the enterprise level.
So how is NVIDIA helping to democratize access to both AI chips and AI software?
BRISKI: I think that there’s a lot of innovation happening at the startup level. I think that is why we have this Inception program for startups who need additional help, extra attention, sometimes compute, sometimes libraries, sometimes advice. And the last time I checked, we’re somewhere around like 18,000 Inception companies that we work with. I think what’s interesting is that the time has changed — you can have a smaller amount of people working in a startup, if you have access to compute, because you can actually get a lot more done. Again, when I was talking about productivity gains, you can now take a maybe 10 person engineering company and you can increase their productivity by 10. Now you have the equivalent of a hundred person company of engineers working for you.
And then working with our ecosystem partners, right? So we don’t just work with the big companies, but we work with our ecosystem partners to make sure that they are integrating our tools and have access to our tools. And so they can now come through different channels faster to the enterprise.
Copy LinkWhy energy efficiency is becoming central to AI infrastructure
EL KALIOUBY: So today AI is very energy hungry. I read somewhere that it takes about the entire energy consumption of the whole country of Costa Rica to train an AI model using NVIDIA’s H100s, which in my mind is not sustainable.
This is not a sustainable way to train and deploy AI. Do you see that changing with some of NVIDIA’s newer chips?
BRISKI: Yeah, I think most people are actually kind of surprised how much we think about energy efficiency at NVIDIA. A lot of the top 500 green supercomputers are NVIDIA GPUs. And so when, again, you think about the purpose of accelerated computing is that you do more with less, and so you accelerate it and you’re actually saving energy in the long run. We want people to have a smaller inference footprint.
We want the smaller model with more accuracy, because we know that you’re gonna need so much more use out of it. So we’re always thinking about energy efficiency, not just for training, but also for inference.
EL KALIOUBY: That’s great. All right. Final question. If you could have AI do anything for you, what would you have it do?
BRISKI: Could it do laundry? I don’t think it does.
EL KALIOUBY: Second that. Totally second that. And fold it. And fold it. Right?
BRISKI: And fold it. Yeah. Like, in all seriousness, I think the point of artificial intelligence is to have a better human computer interaction. It’s this interface, right? Like, we’ve had to be wielded to the purpose of the way we input our computers, because that’s how computers understand, and now we’re finally at a place where we have this way of being able to talk into an interface naturally, and it is able to respond back, and again, go off and achieve things and do things. And so I’m excited to — I always think that there’s not enough hours in the day, and so if I could be me times 10, that would be great to get the things that I want to get done in the middle of the day.
EL KALIOUBY: Absolutely. Love that. Thank you, Kari, for joining us on the show.
BRISKI: Rana, it’s been a pleasure. Thank you so much for having me.
Episode Takeaways
- Rana el Kaliouby opens by spotlighting NVIDIA’s new Grace Blackwell Superchip and Cosmos models, arguing that embodied AI could be the next big leap after chatbots.
- With Kari Ann Briski, she reframes NVIDIA as far more than a chip company, explaining how its full-stack approach spans hardware, software, training, inference, and deployment.
- Briski says accelerated computing now matters at every stage of AI, because modern systems rely on many models working together and agentic AI will only raise the need for speed.
- The conversation then turns to sovereign AI, with NVIDIA helping nations and enterprises build models around their own data, languages, culture, and operational needs.
- They close on practical tradeoffs around small versus large models, startup access, edge privacy, and energy efficiency, while agreeing the dream use case is still AI that does the laundry.