With Groq, Jonathan Ross is taking AI inference to new speeds

Groq® Inc.’s Jonathan Ross

Jonathan Ross and his company Groq are building AI chips focused on fast inference, increasing the speed with which AI models can process new information. But it’s not just about speed. Faster inference means that AI can deliver answers more quickly, and in turn save on energy and compute costs. In this episode of Pioneers of AI, we explore why Groq’s chips are making waves, how AI will create more jobs, and the importance of democratizing access to AI.

With Groq, Jonathan Ross is taking AI inference to new speeds — with Groq® Inc.’s Jonathan Ross and Pioneers of AI’s Rana el Kaliouby

0:00 / 0:00

About Jonathan

Founder & CEO of Groq, a multi-billion-dollar AI inference company (2025)
Led development of Google’s TPU, the pioneering custom AI accelerator
Invented Groq’s LPU for ultra-fast, energy-efficient AI inference
Built a vertically integrated inference cloud used by 1M+ developers (2025)
Closed a $1.5B Saudi Arabia deal and deployed ~20,000 chips in 51 days (2025)

Why Groq was built to widen access to AI compute
How a different chip architecture creates speed and efficiency
Why faster inference can improve the quality of AI answers
Why Groq sees inference as a distinct market from training
How energy efficiency becomes a limit on AI growth
Why Groq built a full stack platform instead of just selling chips
What keeps startups from getting the compute they need
Why AI may expand demand for work instead of eliminating it
How open source models are reshaping competition and trust
Why universal AI access and human agency matter together
Episode Takeaways

Transcript:

With Groq, Jonathan Ross is taking AI inference to new speeds

JONATHAN ROSS: I want you to imagine walking up to a computer and typing in a Google search and waiting a minute for an answer. You would just give up and go on and do other things, right? And so once you get speed, you can never go back. It’s sort of like getting broadband. You never wanna go back to dial up.

RANA EL KALIOUBY: That’s Jonathan Ross. Founder of Groq, one of the rising AI chip makers out there. And Jonathan says that the reason why people want fast internet, is the same reason why people want fast AI.

EL KALIOUBY: Groq – with a Q not a K, so not to be confused with X’s AI chatbot – is NOT one of those companies. They make low latency a priority. Their AI chips – called LPUs – focus on inference. This is when AI takes all the information it’s trained on to make a decision. So basically that answer ChatGPT just gave you on what to cook for dinner – that’s inference.

Groq is staking their claim on fast inference. And as a multi-billion dollar company, their bets are paying off.

Today, I’m talking with Jonathan about how his chips provide faster inference while saving energy, why AI will create more jobs, and the importance of democratizing access to AI.

I’m Rana el Kaliouby and this is Pioneers of AI – a podcast taking you behind the scenes of the AI revolution.

[THEME MUSIC]

Well, Jonathan, welcome back to Pioneers of AI. We last saw each other in December at the Fortune Brainstorm AI Conference, which was an awesome conversation. So I am so glad to have you back on.

ROSS: Oh, thanks for having me. I appreciate it.

Copy LinkWhy Groq was built to widen access to AI compute

EL KALIOUBY: So to kick us off, I wanna take us all the way back to when you started Groq. Before that you were at Google, and I’m just always curious to hear people’s kind of founding stories and what was the tipping point that led you to start Groq.

ROSS: So you wanna know what my radioactive spider bite is? So I started the Google TPU, that’s the AI chip that Google uses. And I did it as a side project. It started to become known that Google had made an AI chip. We actually had it in production for about a year before it was ever announced. And so I started getting calls from other companies that also wanted to build AI chips. And one of them made a pitch to me, which was, come join them, because Google had a tremendous advantage in having their own AI chip at the time.

It was 10 times faster than any GPU. And make sure that there isn’t a concentration of AI in one place. Help us make sure that there are two players. And I was thinking, gosh, that’s a pretty good argument, but one versus two, that’s not a huge improvement.

EL KALIOUBY: All right.

ROSS: Make sure that everyone gets access. So, that was in the back of my mind, but I still didn’t intend to do a hardware startup because frankly it’s hard. And then I realized, well, gosh, people are giving away the models for free. They’re giving away the frameworks like TensorFlow for free. I didn’t feel that software was gonna be a great moat. So as I was going out and talking to investors, one of them asked, well, what would you do differently? And I said two things. Number one, I would start with the software. The software is a mess on all of these AI chips. Even TPU, it’s very difficult. And number.

EL KALIOUBY: As in, it’s like not optimized for these chips.

ROSS: It’s optimized for it, but you have to optimize it by hand.

EL KALIOUBY: Mm Okay.

ROSS: It takes a long time. But the other realization was everyone knows about Moore’s Law and that the number of transistors doubles every 18 to 24 months. What we realized was actually the number of chips was also doubling every 18 to 24 months. And so if you were doubling the number of chips, that meant that effectively you could act as if the number of chips was infinite, and if the number of chips was infinite, how would you design differently? And so we came up with a very different architecture that doesn’t even use external memory. We just use a larger number of chips and we lay out the models and the computational problems across those chips. And it reduces the cost. It improves the speed. It’s a very different architecture than anything else that exists.

Copy LinkHow a different chip architecture creates speed and efficiency

EL KALIOUBY: Yeah, so the AI chip market looks a lot different today than it did when you started Groq, and so much has happened, including, of course, the generative AI explosion. And there are giants like Nvidia, but there’s also competitors like you and other companies. We actually had Gavin Berti, the CEO of Etched, on our show recently. They do specialized chips that are very focused on transformer models. I just think the whole AI chip space is very ripe for disruption. And I’m curious, like, what do you make of the current kind of AI chip space and where does Groq fit into all of that?

ROSS: Well, history doesn’t repeat itself, but it rhymes. And our inspiration came from AlphaGo.

EL KALIOUBY: AlphaGo is an AI model that was developed at Google DeepMind to play the game Go. At the time, Jonathan was still at Google and he says that the AlphaGo team reached out to him during a critical stage in its development: a five game match with top Go player Lee Sedol.

ROSS: The test game against Lee Sedol didn’t go very well, and so we got an email saying, is your chip as fast as we’ve heard? This was the TPU at Google, and the answer was, yes it is. We didn’t know how fast they had heard, but that’s the appropriate answer, right? We spent the next 30 days furiously recompiling their model to run on the TPU. Even though the model didn’t change, it performed dramatically better. It went from losing to winning dramatically. That was the realization that compute was going to influence the quality.

Copy LinkWhy faster inference can improve the quality of AI answers

EL KALIOUBY: Why would that be the case, by the way? Why would it change its answer?

ROSS: Well, let’s think of it this way. Imagine that you are writing an essay. I tell you that when you write that essay, you have to do it without hitting the backspace or delete key even once. How good is that essay gonna be?

EL KALIOUBY: Right. Not so good.

ROSS: Now, imagine you ask 10 different people to write essays in parallel, not seeing what the others are writing without the backspace or delete key. You pick the best one. It’s better, but it’s not. Yeah, it’s not gonna be great. Now you let one person iterate 10 times, there’s a good chance that it’s gonna be a lot better. That’s especially true with coding and technical problems. So by being able to iterate, you actually improve the answer. This is one of the reasons we focus so much on latency because as long as it takes to get these answers from LLMs today, as you add reasoning, it gets worse.

EL KALIOUBY: The AlphaGo trials unfolded in 2016. But that realization of the importance of compute speed eventually led to Groq – and its chip specifically designed for Gen AI. Now nearly a decade later, Groq is a multi-billion dollar company. They actually just landed a 1.5 billion dollar deal with Saudi Arabia.

ROSS: So the way that we deploy around the world is we partner up. We’ve been working with Saudi Arabia. We’ve been talking to them for about a year, but this moved very quickly. We decided to do this about four months before we completed it. We signed the contract and 51 days later we had almost 20,000 chips up and running and serving traffic in the country.

EL KALIOUBY: Yeah. Amazing. What are they doing with all of these chips? What are the applications?

ROSS: It’s mostly commercial. And it’s not all Saudi Arabia actually, so we’re serving traffic from all around the world. And when you think about it, training is a bit of a local game. You do it in a data center somewhere in the world. Inference is a global game. And so this is one of the reasons why you see so many people trying to build out, quickly build all their infrastructure, but one of the advantages of being fast is even from halfway around the world, we can complete a query faster than anyone else. We did those 20,000 chips and that worked out so well. We’ve signed this deal to expand it quite significantly this year.

EL KALIOUBY: Yes, Groq is going full speed ahead with their LPUs. But Jonathan doesn’t see his company in competition with the biggest AI chip maker, Nvidia. We’ll get to why after a short break. Stay with us.

[AD BREAK]

Copy LinkWhy Groq sees inference as a distinct market from training

EL KALIOUBY: Do you see Groq as an Nvidia challenger?

ROSS: Not really. I think when you look at what Nvidia does well, they do training amazingly well. And with training, all you need is brute force compute.

Doesn’t matter how long it takes, it doesn’t even matter if it’s particularly expensive because you’re gonna amortize it across all of the usage.

When you’re talking about inference, typically you’re gonna have about a 10 to 20 times larger inference deployment than training. That’s where it starts to become very cost sensitive, and that’s also where you just need more scale and also it’s latency sensitive. So our expectation is that Nvidia is going to continue doing very well in training.

In fact, we’ve had customers ask us when we show them the speed of our demo, should we just not buy Nvidia GPUs? Should we just buy LPUs? My answer to that is always no. Get every GPU you can. First of all, it’s hard to get them. Second of all, you need them for training. And third of all, the more inference you do, the more you’re gonna wanna train your models to optimize them more, because then you get more out of each inference.

So the more inference, the more training, the more training, the higher the quality, the more the demand goes up. Every time a new model’s released, our usage spikes. So it’s a virtuous cycle. Now, of course, a large incumbent wants every part of the market, but success is usually based on an element of focus.

GPUs are not designed for inference. They’re expensive, and they’re just not low latency.

EL KALIOUBY: Yeah. What is your competitive moat like? What is stopping an Nvidia from like designing chips that are optimized for inference?

ROSS: One of our moats is that not only did we start early, not only did we build something general that seems to work no matter where the market is moving on different model architectures, but also if we need to adapt to what’s happening, we’re gonna be able to do that faster than almost anyone else, because we just don’t have to rewrite the software. We’re actually moving to a much faster chip cadence.

We’ve hinted that maybe there’s a new chip coming, but there might be another one coming soon after that. And if you can get into a very fast cadence, because most of it’s the software, if your software’s automatic, then you can iterate. And the speed of iteration is the speed of innovation.

Copy LinkHow energy efficiency becomes a limit on AI growth

EL KALIOUBY: Okay. Let’s talk about energy consumption and just sustainability of AI, both on the training side and the inference side. LPUs are a lot more energy efficient than GPUs. So why is this important?

ROSS: Well, you’re only gonna be able to have as much AI as you have energy to power it. There’s a stack up in civilization, right? So first you have materials, then you have energy, then you have information. Then you have compute and compute is different. It’s about creating something contextual in the moment creatively. That requires compute, but it requires everything else down that stack. You don’t have the materials to build the chips, then you can’t have the chips if you don’t have the energy. You can’t power the chips if you don’t have the information. You can’t train the models, and if you don’t have the compute, you can’t run them. And so it’s absolutely fundamental. And then what we did that was quite unique. Rather than retrieving data from external memory, we actually just have a large number of chips and each chip does a little bit of the computation and then passes it on to the next set of chips, sort of like an assembly line, right?

And the reason that that’s so efficient is when you’re reading from external memory, there’s a wire and that wire gets charged and discharged, and the longer that wire is, the more energy, the wider that wire is, the more energy.

That memory is not inside the chip. It’s actually quite far away. And so all of that data that you’re reading is requiring a large amount of energy to move from that memory over into this chip. And so when we look at the amount of energy used by GPUs, just the memory reads alone are more energy than we use.

EL KALIOUBY: Yeah. That’s amazing. I love the stack explanation too. I had not heard that before. It’s very powerful. So many people think of Groq primarily as an AI chip company. But you are a vertically integrated solution, more like an Nvidia from that perspective. And the Groq Cloud platform – I don’t know the latest number, but it’s probably over 500,000 developers at this point.

ROSS: So last week we crossed over a million.

Copy LinkWhy Groq built a full stack platform instead of just selling chips

EL KALIOUBY: That is amazing. Congratulations. So, why was it so important to be a vertically integrated platform, both hardware and software? And what are some of the developers doing on the Groq Cloud?

ROSS: So we always intended to do this, but when we started, capital wasn’t as easy to get.

So our initial plan was let’s sell hardware and eventually we can use that money to build our own cloud. What ended up happening was, it turns out when you develop your own chip and you try and get others to adopt it, you’re actually not solving a problem. You’re creating a problem. The problem is they had something that worked and now you’re trying to give them something that requires a lot of work to get working.

When we were trying to explain what we could do to people who were serving their own models and that it would be much faster, the feedback that we got was, why would you ever need an LLM to be faster than you could read, like just like a teletype thing, like dial up, whatever, and we’re like, but that’s not how the internet works. No one wants webpages to load that slowly. Use your intuition. And they couldn’t. Once we put an LLM on our website and showed how fast it was, all of a sudden we went viral and we immediately started making it available as an API. We had to develop the software for that so that people could build on it.

So if you’ve got a real amazing product, rather than trying to sell it to people on how what they’re doing will get better, just put it out there yourself and go viral yourself.

EL KALIOUBY: Like bring the possibility to life, and then people see it.

ROSS: Now, as for what people are doing on it, you name it. A million developers, we’ve seen it all. Everything from people searching documents to people doing legal work. What we’re seeing is most customers are either in the startup bucket or actually in the very large, like Fortune 50 bucket. Not a lot in the middle. So we’ve analyzed that and what we’ve come to a conclusion on – we could be wrong – but startups, when you’re going from having no customer support to, I’ve got a chat bot that has access to a little bit of information and it sometimes gives wrong answers, but it works. It’s better than nothing. So it’s an improvement. It solves a problem. So they’re not looking for perfect, they’re looking for solving a problem. When it’s the larger companies, what we’ve seen is they’re willing to try a very large number of initiatives. Now, most medium-sized companies can’t afford to do that.

Copy LinkWhat keeps startups from getting the compute they need

EL KALIOUBY: Yeah, that’s super fascinating. So I am very passionate about democratizing access to AI. And in our previous conversations, I know you are too. And with my investor hat on, I’m always thinking about how these startups get access to compute, right? Because that’s going to unlock a lot of innovation. And I sometimes worry that a lot of the compute is hogged by the bigger players. Who do you think is currently being left out and how do we fix that?

ROSS: As always, the way that you’re left out is either it’s just too expensive and you can’t launch what you’re doing.

EL KALIOUBY: Mm-hmm. Mm-hmm.

ROSS: Or you can’t get enough to launch what you’re doing. Like what we hear from customers all the time is they have requested a large number of GPUs, and the wait time is just too long. Others who have paid for GPUs are waiting sometimes as long as a year to get them. What it comes down to is can you launch the application that you’ve developed? So very often people will develop on the most capable model, and then they will attempt to deploy it. It’s too expensive. So then they look for open source models, smaller models, can they port their application?

It’s not like people are typically saying, oh, I just don’t have enough compute. They’re typically saying it as it’s too expensive, but they mean the same thing.

EL KALIOUBY: Yeah. Can you be very specific around what we mean when we say too expensive?

ROSS: Yeah. It used to be that your largest line item in running a business was your employees. And nowadays your cloud budgets are getting pretty close to your people budgets, and that’s only gonna get worse. Think about it this way. If you replace your customer service with an LLM, you’re effectively replacing what some people did, right? Now people are growing sort of hybrid. They hire people, but they also hire LPUs and GPUs. Right.

EL KALIOUBY: Right.

ROSS: Now, it’s not too surprising that people like to do stuff with AI because it’s repeatable. Once they get it to a certain capability, they can just scale that up. It’s not like I have to train another person to do that thing. So then they wanna scale up really quickly and they can’t, they can’t get access to enough compute. And so what you’re seeing is we’re getting to a point where we’re almost able, cost-wise, to get the same capabilities out of the hardware as hiring sort of minimum wage workers and so on. And then you’re just not gonna be able to scale if you can’t get that.

Copy LinkWhy AI may expand demand for work instead of eliminating it

EL KALIOUBY: As we think about the cost equation, it’s back to this cost of every call you make. That factors in too, right? So if you have a customer service bot and this bot serves, I don’t know, a thousand customers, right? And these customers are prompting the bot all sorts of questions. You gotta kind of put into consideration the cost of each of these calls to the LLM or to the underlying model. Is that also how it gets expensive?

ROSS: Well, there’s also another thing. I actually believe that AI is going to cause labor shortages. I don’t think we’re gonna have enough workers.

I’ll give an example. I was talking to someone recently who was explaining why they thought all customer support jobs were gonna go to AI. And then what they explained was they just had had this amazing experience calling into a call center and it was all AI. Everything was AI up until the final step where it went to a human being to verify the correctness of the actions.

So I asked them how often do you typically call support? He’s like, never. It’s a terrible experience. I’m like, but that was a good experience, right? Yeah. Would you call more often if you got that same experience? Absolutely. So now that you’re making that experience good, people are gonna want it. If you know that you can call an airline and change your flight easily, if you know that you can call a hotel and get any issue sorted, you’re gonna do it more.

And so the demand for that service is gonna increase. And even if the proportion of time that’s spent with a human being on the other side doing it is lower. Overall, the value to that business because of the better experience is gonna mean that they’re gonna spend more on it. But there’s other reasons why I think it’s gonna cause labor shortages. The other one is think about jobs that exist today that didn’t exist a hundred years ago. Software engineer. Right.

It’s not like those people just sat at home and did nothing. We invented new jobs for them to do. And then the last part is if we end up in this sort of deflationary economy, because things are so inexpensive to build. A lot of people are gonna opt outta the workforce because they’re gonna be able to work either part-time. They’re gonna be able to work fewer days a week, or they’ll be able to work fewer years of their life before they retire, and then go on and do whatever they want to do. And so that’s also gonna lead to labor shortages. So the fact that businesses can exist with fewer employees means there’ll be more businesses doing things that weren’t even profitable before. That many people will opt out sooner and that there will be greater demand for services because the experiences will be better. And I think all of those will come together to cause enormous labor shortages.

EL KALIOUBY: Yeah. What are some new jobs that you’re seeing AI create?

ROSS: I mean, the most obvious is prompt engineer.

EL KALIOUBY: Yeah. Right.

ROSS: We’re actually looking to hire our first head of prompt engineering.

EL KALIOUBY: How do you hire for that position?

ROSS: So we’re trying to figure that out. But here’s some of the initial intuition that we have. Prompt engineering is more about communication and leadership than it is about doing sort of grindy work where you’re just like iterating and tweaking. So can you communicate clearly? Can you visualize what you’re trying to get at the end?

The way that we look at it is, it used to be that you would have hardware engineers that would wire things up, and then this wacky thing called software engineers, which at first people were like, that’s not a real job, right?

You just sit there and you type stuff. Like what? And then eventually people became full-time software engineers, but at first they had to do a little bit of hardware engineering in order to make the computers work. And then what we see happening next is right now prompt engineering involves a little bit of software writing, but eventually it’s gonna be purely through language.

Now imagine anyone who can speak can now create their own business.

It’s gonna unlock billions of people. There’s what, 1.4 billion people in Africa about the same number in India, right? All of a sudden they will be unlocked.

EL KALIOUBY: Yeah. It’s very exciting. AI can unlock human potential. But with more access, how can we make sure that our data is safe? Jonathan thinks the answer is open source. That and more after a short break.

[AD BREAK]

Copy LinkHow open source models are reshaping competition and trust

EL KALIOUBY: So let’s talk about DeepSeek. Obviously it rocked the AI world, primarily because they were able to build a model that has very similar capabilities to OpenAI and other foundation models out there for a fraction of the cost. And of course, they open sourced the model. I personally think this is great news because it makes AI more efficient and more accessible, which only increases demand. But you guys called this a game changer moment, and I’m curious about your thoughts around DeepSeek. You also, I think, offer it through Groq Cloud. So yeah, tell us your thoughts about all of this.

ROSS: So for about six months before DeepSeek released their model and the distilled versions, there was a lot of murmuring in the venture capital community of are the LLMs going to be commoditized? And the general sentiment was yes.

EL KALIOUBY: Meaning that it doesn’t really matter anymore what LLM you’re using – they essentially all do the same thing.

ROSS: What DeepSeek really did was it ended any questioning about whether or not these models had been fully commoditized. At this point, no one is seriously entertaining, as far as I know, building a model company where it’s purely based on the quality of the model, right? If there was any harboring of a notion that that was a good business, that’s over now since DeepSeek has been released. Now everyone’s rushing to open source these models. And if you haven’t open sourced your model yet at this point, since it was never really a moat to begin with, it’s kind of, what’s wrong? Like, why are you holding it back?

And what DeepSeek also did was they published a lot of the details behind even how they ran the models. And they even shared their revenue numbers and their profit margins. So it showed, hey, you can make money if you do this and you do these things and that. So like, it’s really changed the game from being a little more closed to being a little more open.

EL KALIOUBY: Yeah. There’s always been this tension between open source and proprietary, right? Like in software. Do you think it’s important for these models to be open source? And how does that help the company that’s open sourcing its models, but also how does it accelerate innovation or not?

ROSS: Well, if I look at the word important to whom, right? Instead of important, I would focus on, can you compete with a closed source model if there’s a bunch of open source models? And open always wins, always. Now people have shifted from thinking that open is full of vulnerabilities and lacks quality to open is always better, safer, and so on. So now the onus is on those who are doing closed models to prove that their models are as good. And there’s just this groundswell of support for open.

EL KALIOUBY: Yeah. Let’s talk about safety for a second and trust, right? So for example, I’ve been playing with the DeepSeek model and, at the end of the day, I don’t know if I trust a Chinese-based company. I don’t know where the data’s going. I don’t know what they’re doing with the data. And so I think as we continue to see more and more applications of AI, I am very interested in the question of safety and governance and just responsible AI. Curious what you think about that.

ROSS: Yeah, so safety is a complex topic and in particular around the Chinese models. So there’s two different things to worry about. The first is, where’s your data going? And if your data’s going into China, it’s not like these companies have any way to say no to the CCP. If the CCP asks for the data, they have to give it. So you have to assume that any query that you do to the DeepSeek service itself, you might as well be sending that straight to the CCP and putting your name right on it. And so David Sachs, the AI czar for the US, advocated for a couple of different US companies that are running the DeepSeek model to be used so that way the data’s not going to China.

And Groq – he said that we’re one of those, so you can use us. We actually delete all queries after they’re done, so we don’t retain your data.

EL KALIOUBY: Interesting.

ROSS: But there’s another concern as well in the Chinese models, which is they’ve also been trained to answer certain questions. Like, tell me about Tiananmen Square, or how do you treat the Uyghurs? And if you ask these queries, they’ll give answers like, the CCP treats all people with equal respect. The bigger concern there isn’t the censorship – that’s not great, but worse is what if they intentionally bias the model? So imagine that the CCP says, we want this person to win the next election, and then all the answers are, well, you should vote for this person for this reason. And do we want to give that kind of control to the CCP? And I would say no.

EL KALIOUBY: Yeah, absolutely. How do you think we should be building guardrails into these systems and these models?

ROSS: Guardrails are a little bit different than dealing with the biases. On the guardrail side, one of the things about these LLMs is remember that they’ve been initially trained on the internet. And the internet doesn’t necessarily have the most eloquent, most subtle, most nuanced content, right?

But interestingly, the more you train the models and if you give them just a little bit of fine tuning, they start to act more like rational, intelligent entities. So it’s probably not too surprising that as the models get more intelligent, they actually start acting more intelligent. They start treating things with more subtlety and nuance, they have more creativity, more understanding. They can even help people resolve issues. And so I suspect that while it does take work, the work actually gets easier, the smarter the models get, not harder.

Copy LinkWhy universal AI access and human agency matter together

EL KALIOUBY: Okay. Two more questions. One is, you’ve said in the past that down the line, at some point in the future, you would love to offer free access to Groq around the world. Who would you want to offer this technology to and why? For free.

ROSS: Well, first of all, it’s not down the line. We already do. And our mission is to drive the cost of compute to zero because we want everyone to have access, right? Not just the wealthiest in society, right? We’re worried about a concentration of compute power. What we want is for everyone to have access to this. Just imagine if you had the ability to on demand get access to a thousand PhD students to help you solve a problem. But the other person that you’re competing against doesn’t have that. That’s gonna cause an imbalance.

So our goal is to make sure that everyone gets equivalent access to AI. Now, of course, it has a cost. We can’t give all of it away for free. But once you start getting into the sort of token rates that are required to run a business, then we start charging. It’s a little bit like electricity, right? If you plug your phone in at a hotel or a restaurant, no one’s gonna charge you for that. On the other hand, if you’re trying to run a data center, you’re gonna get charged.

EL KALIOUBY: Right. Yeah. That’s awesome. Okay. Final question. And we were starting to kind of allude to it a bit. I really believe that AI ought to be applied to unlock human potential, but I also think a lot about, okay, so if AI is gonna be smarter and even nicer – more creative, more empathetic than humans – what does it mean to be human in the age of AI?

ROSS: And I think that’s something we’re gonna have to figure out. And so one of our core missions internally is to preserve human agency in the age of AI. Think about it this way. Think about someone who’s very affluent and has had children and how that affluence is a benefit, but sometimes it’s also a hurdle, right? There’s a lack of mission. There’s a lack of sense of purpose because you could literally sit on a couch all day. You have a trust fund, right? And as a society, I am less worried about AI taking over. I’m less worried about AI taking jobs. I’m more worried about what happens when we don’t have to work and how we’re gonna find our own purpose, how we’re gonna also preserve our own agency. Because I’m worried that we will give our decision making authority over to AI because decisions are hard. And we will probably hand over simple decisions that will be a wise thing to do, because you can make a finite number of decisions in a day, right? But the important decisions, we should continue making those ourselves. And I think as a society, we’re gonna have to figure this out.

EL KALIOUBY: Yeah, super fascinating, right? Like how do we continue to have that agency, but also the motivation and a sense of purpose. Great way to end our interview today. Thank you, Jonathan, for joining us. Super fascinating.

ROSS: And thanks for having me.

EL KALIOUBY: Purpose and profit. Not all multi-billion dollar companies are thinking about marrying the two. But the ones that do, are going to have the biggest impact.

As AI becomes more efficient, there will be more demand for it.

Demand will mean more users, which will mean we’ll need more AI chips for training and for inference. And we actually also need more energy to run these chips – but that’s for a different conversation.

So, this means two things. One, there’s lots of room for innovation and disruption across the entire AI tech stack.

And two, unless we’re proactive, compute will only be available to a select few. This is where Groq comes in. It’s committed to ensuring that that doesn’t happen.

Sure, Groq is partnering with entities like Saudi Arabia and others to scale their AI platform. But they also want to bring AI compute to the masses – and they want to do that, in part, for free.

On Pioneers of AI, I’m so excited to continue featuring entrepreneurs passionate about democratizing access to AI, so that we can all reap the benefits.

Episode Takeaways

Groq founder Jonathan Ross says speed is the whole ballgame in AI, because once users experience low-latency answers, there is simply no going back to slower systems.
Ross traces Groq’s origin to his work on Google’s TPU and a conviction that AI compute should not be concentrated, leading him to build chips optimized for fast inference.
Drawing on AlphaGo, Ross argues faster compute can improve model quality itself, while Groq’s global rollout and Saudi deal show how inference has become a worldwide race.
Rather than taking on Nvidia head-to-head, Ross says Groq is focused on cheaper, lower-latency inference, with an architecture and software stack designed to move faster than rivals.
The conversation broadens to access and impact, with Ross arguing open models, affordable compute, and free entry-level usage are essential if AI is to expand opportunity without eroding human agency.