Etched is ready to disrupt the AI chip market, with Gavin Uberti
Gavin Uberti thinks that his company Etched can be the “biggest company of all time.” His competitive advantage: faster, specialized AI chips that drive down the cost of inference. In this episode, we explore how cheaper chips could democratize AI, the big bet Uberti is making on transformer models, and how Etched is standing up to chip giant Nvidia.
About Gavin
- Co-founder & CEO of Etched, building transformer-first AI chips to challenge NVIDIA
- Raised a $120M Series A in 2025 to bring Etched's systems to market
- World champion for mathematics in high school
- Developed AI model performance software at OctoML
- Guest lectured at Columbia University and TinyML
Table of Contents:
- Why dropping out led to a deeper view of computing
- Seeing the hidden inefficiencies inside modern chips
- Why transformers became the winning AI architecture
- Betting on a chip built only for transformer models
- How Sohu aims to cut latency and unlock new AI experiences
- Why selling full AI systems matters more than selling chips alone
- Lowering inference costs to make advanced AI widely accessible
- Building ethical AI by making powerful models affordable for everyone
- Competing with incumbents while preparing for sovereign AI needs
- Why the public still underestimates how capable AI models have become
- Episode Takeaways
Transcript:
Etched is ready to disrupt the AI chip market, with Gavin Uberti
EL KALIOUBY: As we continue to see advances in AI models – whether it’s DeepSeek or the latest Grok model – we’re also seeing an increase in the demand for compute. As a result, the race to create the winning chips that can power cutting edge AI innovations shows no sign of slowing down. At the forefront of this race is of course NVIDIA, which has over 80% market share when it comes to AI chips.
But the world of AI chips is ripe for disruption. And there are many other players in this world who are not making headlines every day. Gavin Uberti is carving out a space in the market with his company _Etched, and its specialized chips that are optimized for transformer models, the de facto AI architecture for most AI models we see today.
And Gavin’s very aware of who his competitors are and what he’s up against.
GAVIN UBERTI: Well, our competitors are very good at what they do and I think a lot of companies have tried to go do what NVIDIA does, and they’ve turned out to be worse. There have been other AI chip companies who have tried to build flexible AI chip products that have, again, been able to be cheaper than NVIDIA products, but not better.
So, if you wanted to beat NVIDIA, you can’t play the game the way they’re playing it. I think specialization is one of the few solutions that can let you get an advantage. We are not going to beat NVIDIA at almost everything they do. But at one, Transformers, we will dominate.
EL KALIOUBY: In this episode we’ll talk about what that domination could look like, the big bet Gavin is making on transformer models, and how AI chips can help lay the groundwork to making AI accessible to as many people as possible.
I’m Rana el Kaliouby and this is Pioneers of AI – a podcast taking you behind-the-scenes of the AI revolution.
[THEME MUSIC]
Before Gavin and I dove into our conversation about chips and transformer models, I wanted to learn more about his story leading up to the founding of Etched. Following the legacy of tech founders like Bill Gates and Steve Jobs, Gavin also dropped out of college. And I … had some feelings about that.
Well, I have a confession to make.
UBERTI: A confession.
Copy LinkWhy dropping out led to a deeper view of computing
EL KALIOUBY: A confession. Yes. So my daughter is a senior at Harvard and in preparation for this interview, I was kind of simulating the scenario where she comes to me and she was like, mom, like I’m dropping out to do my own thing. And I was pretty uncomfortable.
Like, I don’t know if I was totally on board with that idea. You of course dropped out of Harvard to start Etched. And I have to say, I am totally on board with that idea because I’m a proud investor in the company, but I’ve never actually heard your origin story. How did that whole thing unfold while you were at Harvard? What were your parents’ reactions or other people important to you in your life? Like, tell us more.
UBERTI: Well, I actually did not drop out to start Etched, I just dropped out.
EL KALIOUBY: Okay. Okay. Even cooler.
UBERTI: So I was a world champion for mathematics in high school, went to Harvard to study mathematics, loved it there. Unfortunately, the world is very big, and most of that’s not at Harvard. So around a semester and a half, or a year and a half into college, I got the idea of, hey, I’m gonna drop out, I’m gonna live as a digital nomad, I’m gonna travel all around South America and Southeast Asia. For the year I was on a flight about once every four days.
EL KALIOUBY: That is crazy. And what was your parents’ response?
UBERTI: Oh, well, I paid for it myself, so I didn’t really care. To actually pay for this, which is not cheap, I got a job doing microkernel development.
EL KALIOUBY: Microkernel development– basically it’s developing software at the operating system level.
UBERTI: Back in 2022, there was a lot of demand to go run AI models very efficiently. And since AI models are powered by matrix multiplication and convolution operations, somebody has to go write the code that runs these very efficiently. So I got a job doing this for our microprocessors. And the way that you do it, the way that you get every last little bit of performance out of these programs is that you write them, not in a programming language, but in assembly.
EL KALIOUBY: Yeah, exactly. Oh my God. I remember assembly. I studied that when I was a computer science undergrad. It’s basically the low level language of computers. It’s the opposite of prompt engineering.
UBERTI: Yeah, so I got to know the Cortex M4 and M7 very well. They are two of the cores made by ARM that are on some of the cheapest devices that you can buy.
Copy LinkSeeing the hidden inefficiencies inside modern chips
EL KALIOUBY: Cheap, but Gavin found ways to get more performance out of them. He then traveled the world and worked on optimizing machine learning code, just months before AI development began to ramp up due to generative AI. And during his travels he noticed a gap.
UBERTI: I love traveling, but the thing you really realize when you write this assembly code is that only a very tiny part of each one of these chips is used for doing the math. Most of the job that you do is not programming the math blocks, it is get the data to the right place at the right time. That is much more challenging than the math operations themselves.
In fact, on a chip like the H100 from NVIDIA, only around 3 percent of the chip is spent on matrix multiplication blocks.
EL KALIOUBY: Meaning that the H100s are highly flexible chips that can be used for AI and non AI functions. Which had Gavin thinking: could a more specialized chip, dedicated to AI, be more energy efficient and faster than a Graphics Processing Unit?
Okay. So you’re doing all that and you have this realization. Then what happens?
UBERTI: Well, at the same time, I become convinced that we are going to see a little bit of a plateau of AI architectures. You think back many years ago, longer than I’ve been alive, to CPU instruction sets, there used to be many different kinds of instruction sets. But x86, the one that powers most Intel processors, sort of reaches one critical point of popularity.
Where it had enough adoption, that software got written for x86, which made it get more adoption, which meant there was more software, and led to this cycle where x86 became the dominant instruction set. And I thought the same thing was going to happen for Transformers. They were pretty good in 2022, but there was not really a killer product yet. So I thought they’d get better, more hardware, more software support, better products, and it becomes this self reinforcing cycle.
Copy LinkWhy transformers became the winning AI architecture
EL KALIOUBY: Okay, so I want to pause you here. So transformer models are basically a specific type of AI architecture or an approach to building these AI models where it really leverages sequential data. And what is so transformative, no pun intended, about these transformer models is that they are able to hold a lot of information in memory and process all of this information in parallel. And transformer models are now at the core of ChatGPT and a lot of the other generative AI products, actually also multimodal, right? They power some of the vision and video processing applications as well.
UBERTI: The reason why transformers won, I think, is not just because they’re very performant, but because they run really well on hardware.
So if you want to go build a new model that’s a transformer, it’s super easy. The software is there. The hardware supports it really well. If you want to do it for something else, you’re on your own.
You have to go write your own software. You have to go, well, accept the fact the hardware is not going to really support it all that well.
And I think that means that transformers are going to give better research results, which means there’ll be more investment in them, which means even better research results, which leads to this cycle of transformers staying winning.
EL KALIOUBY: So the moat is basically the fact that it has become almost the de facto way of building machine learning models.
UBERTI: The moat is the fact that there’s a great software ecosystem, there’s a great hardware ecosystem, including us, and people know how to use transformers. And the last piece here, too, is that models are getting really expensive. Rocking the boat, trying new things when they cost 100k each, that’s a thing people can do. Rocking the boat at the hundred million dollar or billion dollar scale? That’s much more challenging. I think that as models get more expensive, you can’t really risk it to move away.
EL KALIOUBY: We recorded this interview before the DeepSeek news broke. DeepSeek, among many other things, proved that it is possible to train cheaper models. It also showed that at the moment the cost of inference is still high.
Since DeepSeek’s release, Etched has said that they’ve actually seen a lot more inbound interest from companies wanting to partner with them. Essentially for AI usage to be commercially viable, companies need to reduce the cost of inference. Something that Etched can do faster and more cheaply than GPUs.
Copy LinkBetting on a chip built only for transformer models
So you clearly made the bet on transformers and you’ve basically kind of decided to build a specialized chip that doubles down on these models and is optimized for these models. So I’ll just take a step back to make sure that everybody’s kind of coming along with us on this journey. An AI chip is essentially a computer chip, very much like the computer chip that powers your smartphone, but it’s specialized for AI tasks and machine learning tasks, for example, processing large amounts of data or processing images and videos, like you said, but also generating images and video. They tend to be optimized for these complex mathematical calculations that allow us to implement these approaches to AI. NVIDIA is an example of an AI chip, like the NVIDIA H100s. That’s an example of a chip that is optimized for AI. But you’re even doubling down even more and building a very specialized chip. We call that ASIC or application specific integrated circuit that is designed to specifically solve for transformer models. Talk us through what that means.
UBERTI: Well, it means that we can’t run most kinds of AI models out there today, which is very scary.
EL KALIOUBY: Uh.
UBERTI: The Snapchat filters that power the dog ears or whatever, those aren’t transformers, those are convolutional models. Or the recommendation engines that power ads for Google and Facebook, are not transformers. But the text models like ChatGPT, and the image generators and the video generators, those are transformers. Unlike a GPU or a Google TPU, our chip cannot really be programmed in the same way. It is by design only able to run transformers, but in exchange for that, it can have much more compute on that same die area. And as a result, it can get much better performance. 500,000 tokens per second for a model like llama 70 dramatically outperforming NVIDIA H100s and B100s.
EL KALIOUBY: To what extent? Like how much more improvement in performance are you getting?
UBERTI: Well, I think the right way to compare based on the numbers NVIDIA gives, they have this page on their dirty LLM repo. That says, how many tokens per second do you get for LLAMA7DB running on 8 H100s in FPA Precision with 2048 input tokens and 128 output tokens. And you get around 30,000 tokens per second for this 8 chip system. Now that’s not 30,000 for one user, it’s 30,000 across a number of different users. For a chip like ours, we will get more than 500,000.
EL KALIOUBY: Meaning that these chips are faster and have better performance compared to GPUs.
After a short break, we get into what this actually means and how it gives Etched a competitive edge in the market. Stay with us.
[AD BREAK]
Copy LinkHow Sohu aims to cut latency and unlock new AI experiences
So your first product is Sohu.
UBERTI: Yes, SOHU.
EL KALIOUBY: Describe what that product does and what are the key kind of advantages of Sohu over other chips out there.
UBERTI: SOHU is a transformer machine. It has way more compute than competing products, and as a result, it can do things like give that 500,000 tokens per second, 16 times faster than H100s, some of the fastest AI chips on the market today. Additionally, you can get much better latencies, too. So there are a few kinds of latency, but I’ll be precise. I’m talking about the latency between when you ask ChatGPT your question and when you get the first token of your response back.
That is the time to first token latency.
For traditional endpoints, you usually see around 300 milliseconds. That’s good, but too slow for a lot of use cases. You can’t do things like video gen, real time video generation or real time audio generation with that kind of delay.
NVIDIA’s TRT LLM page says that they can get Llama7db running at FP16 precision, for a length 2000 input prompt. They can get that time to first token in about 100 milliseconds. With eight Sohu chips, it will be six times faster than our competitor’s products. Now again, that’s not because we’re clever. That’s because all Sohu does is run transformer models. And it’s not just text. We see similar speedups on video generation, image generation, image processing, or multimodal models.
EL KALIOUBY: Speedup or lower latency is key. Of course everyone wants an AI chatbot that responds faster, but there are other reasons why latency is so critical. Think about an AI co-pilot in your car that’s sending data back and forth to AI models. Lower latency is so important here, especially if the car copilot needs to make a life-saving decision.
But for Etched to implement this full vision, it needed to be more than just a chip company.
UBERTI: Our first product is a datacenter product. We’re competing with NVIDIA’s box, which is a datacenter box as well. That’s where most of the market is.
Copy LinkWhy selling full AI systems matters more than selling chips alone
EL KALIOUBY: Yeah, talk a little bit more about your first, yeah, the first kind of customers in this data center. Because I saw somewhere, I saw somewhere that you described Etched as an AI infrastructure company as opposed to an AI chip company. And I thought that was really intriguing. But I think you’re getting to that in your answer around data centers.
UBERTI: Well, one key thing is, Etched does not sell chips. Etched sells solutions. Now, why do we do that? Is it because I like doing harder things? No. It is because I want to get these products in customer hands as fast as possible. And that means you have to build the chip. And build the circuit board that chip sits on. And build the server the circuit board goes into. And build a little bit of the rack that sits around that to make sure it’s cooled correctly and powered correctly. That’s harder, but it’s almost as important as the chip itself. When I say that we are more than an AI chip company, I mean that we sell systems.
EL KALIOUBY: Yeah. And that’s where the data center comes in, right? So if I’m an enterprise customer that is building all these AI models, I’m not just going to buy the chip directly from you. I’m going to buy an entire kind of solution, which includes the servers and yeah, tell us more.
Like, what would that look like? What would the solution look like?
UBERTI: I’ll nerd out a little bit, but we sell similarly to how other people like NVIDIA or AMD or Intel do. They sell at a couple different levels. There is what they call L11, where they sell you a whole server rack. It has the servers, the cooling solution, and the top of rack switch all put together. Or you can buy L10, where you get just the server. And some very technical customers buy AMD and NVIDIA products at the L6 level, where you get the circuit board, the chip, the circuit board, and the baseboard all together.
EL KALIOUBY: Yeah, so I’m kind of thinking, what is the so what here? Like, why is this so exciting and really disruptive for the AI market? Is this going to help us adopt and deploy AI faster? Is it going to help us build these models quicker? Like, yeah. Why does this matter?
UBERTI: The future is here, it is just not evenly distributed.
EL KALIOUBY: That’s true. I believe in that.
UBERTI: The vast majority of people have never experienced GPT-4 level models. And when GPT-5 and GPT-6 and other next gen models come out, they are going to be very expensive.
Copy LinkLowering inference costs to make advanced AI widely accessible
EL KALIOUBY: What do you mean by expensive, by the way? Define expensive.
UBERTI: I mean, they will cost a lot of money per token. As models get bigger it costs more money to run them, and that’s why for a long time, yeah, even still today, you can only get the premium GPT-4 model if you pay for the checked GPT Plus subscription. You can get smaller distilled versions like GPT-4 0 and GPT-4 Turbo for the free version, but the big ones, those are behind paywalls today. And I think that specialized chips, both those made by us and others, are going to help decrease these costs by an order of magnitude, and really democratize access to these huge foundation models. And while today it’s not a huge issue, it is going to become a big problem in the future. If next gen models are 100x bigger, they might be 100x more expensive.
And we see solutions now with agents as well, that are very performant, but very expensive. One of the big coding agent companies today charges more than $10,000 per month just to go cover their own costs of how much it costs to run it with these tokens.
EL KALIOUBY: Yeah. Yeah.
UBERTI: For almost everyone besides the big tech companies, for this to really be an invention for the people, the cost has to come down. Now that said, I’m very confident it will. Many other inventions started very expensive. Cell phones, for example. And then once purpose built hardware came along, they got way cheaper.
EL KALIOUBY: Yeah. Yeah. But I love how, I love this vision of democratizing access to AI. Right. And I love that you’re optimizing on the training side of these models, but more importantly on the inference side. So every time I make a call to a ChatGPT, it’s not costing a lot of money to do this call, but also costing the planet and costing the environment too. That’s a big consideration.
But how can we democratize access to AI responsibly? That’s after the break. Stay with us.
[AD BREAK]
All right, so at this podcast and also as an investor, I’m very focused on this idea of human centric AI. AI that’s good for the planet, good for people, and is ethical and responsible. How do you think about building AI responsibly? What does that mean for you and for Etched?
UBERTI: For me it means keeping the cost low. I think we live in a bubble. Silicon Valley is one of the richest places on earth. And we can afford our $20 ChatGPT subscription, and our $20 Anthropic subscription, and our $30 Mid Journey subscription. And most people in most of the world can’t. Look at iPhones, for example, while iPhones are very popular here, they only make about 25 percent of market share worldwide because almost nobody else can afford them. So for me, building AI ethically means building products that can be used not just by the select few here in Silicon Valley, but all across the world. Building models for not just 10 million, but 8 billion people.
Copy LinkBuilding ethical AI by making powerful models affordable for everyone
EL KALIOUBY: All right, so you are taking on, you’re disrupting the AI chip industry and the AI chip space, including taking on big competitors like an NVIDIA or an ARM or a Google. What is stopping NVIDIA from basically building a specialized chip optimized for transformers.
UBERTI: Eventually, they will. But the fact is, as is common in deep tech, it takes a very long time to get these products to market. It doesn’t matter if you have one dollar or a hundred billion dollars. There are just things that have to go after other things. So, there are long lead times. And we started really early.
We began this company before ChatGPT came out, back when it was really risky to bet on transformers. Not only because the models weren’t as solidified back then, but because there was no use case, and that lead has put us in the front and is going to make us first to market. And yes, if we go sit on our butts, then eventually somebody will go build a better specialized ASIC. But just like you see in CPUs, or in networking chips, or in Bitcoin mining chips, whoever’s in front can usually keep innovating and usually stay ahead like that.
EL KALIOUBY: You’ve said that Etched could be the biggest company of all time. And I have to say, I love this energy and conviction from founders. It is one of the qualities I look for when I decide whether or not to invest in a company. But also let’s be real, right? Like the journey of running a startup is, I kind of liken it to an emotional rollercoaster. Some days are amazing and you know, you’ve just closed your series A round and the world looks awesome. But there’s a lot of challenging days as well. What are some of the obstacles or biggest challenges you’re facing?
UBERTI: Well, the fact is, we are not just building a specialized chip. We are building a chip that has more compute than anything else ever built. And more, not just memory, not memory bandwidth, but just off chip I. O. bandwidth than any other AI chip ever produced. And it leads to a lot of issues on the chip design, on the platform design, on the package design, on the circuit board design, that have to be solved. But it’s well worth it.
EL KALIOUBY: So that’s a lot of challenges on the R&D perspective, like the actual development. Yeah. Figuring out like how to do it as efficient as possible, or as, you know, build these.
UBERTI: I mean, even at all. We are building a product on the cutting edge of technology. But I have such a fine team working with me. My CTO, my chief architect, and my VPs of hardware and platform are some of the most talented people I’ve ever met. And it is a privilege to come and work with them every day.
EL KALIOUBY: That is awesome. I love that. So you have announced your Series A, $120 million Series A round. What are you using the proceeds for and what are some of the milestones you’re aiming for?
UBERTI: We want to make sure that we are at a point where we’re generating revenue. Serious revenue. Before we think about, do we even need more money after this? And as a result, we’re going to be spending a lot of that Series A funding, not on R&D, but on actually building these products. Even though margins of chips tend to be pretty good, it still takes some amount of fixed capital to go build some volume of products. So we’re doing a much larger than normal risk production run.
EL KALIOUBY: Okay, awesome. Who are the early customers?
UBERTI: I can’t name a whole lot of names. They’re under NDA.
We have some strategic investors. Two Sigma, for example, is one of our strategic investors. We’ve gotten a lot of traction with some of these trading firms. One of the exciting use cases of our product is that you can get a much faster time to that first token. So if you want to go read a bunch of text and make, say, a trading decision based upon that. That’s one possible use case. But I think that’s a small market segment. Not the one I’m most passionate about. The companies that get me the most excited are the AI companies that are building products that could not exist any other way, video generation, or some very complicated agent workflows. That’s the stuff that’s going to change the world.
Copy LinkCompeting with incumbents while preparing for sovereign AI needs
EL KALIOUBY: By the way, in a separate interview, we were talking about sovereign AI, helping nations develop their own AI infrastructures, data centers, models, and employing their own kind of local labor force to build AI. Is that something you focus on as well?
UBERTI: I can’t comment too much there, but I do think that investing in sovereign AI is an important thing for governments. One very troubling fact is that LLMs speak English really well. LLMs speak other languages okay. And if you are, say, a government from a country that has a very rich cultural heritage in a language that’s not English, then.
EL KALIOUBY: I’m originally from Egypt, right? Like we speak Arabic in the Middle East. So, yeah.
UBERTI: Now, I will say, people are actually investing quite a bit in Arabic language models. I think people over there realize how existential this is for the language. But I think a lot of other, especially European languages, are not being adequately funded here. And certainly if you think about any language that’s not in the top 20. It’s very possible that the leading models don’t speak it at all. So, do you build your own model? Do you try to convince some other OpenAI esque company to enter into yours? What happens if there’s not enough data in your language? I think that for non English speaking countries, making sure their cultural heritage is preserved with these models is going to be essential, and that means spending a little bit of money.
EL KALIOUBY: Yeah, I agree with that. What is a common misconception you think people have about AI?
UBERTI: The biggest misconception is that the current chatbots suck.
EL KALIOUBY: Mm.
UBERTI: For a long time, the default chatbot on OpenAI was GPT-3.5. And GPT-3.5 is not a very good model. Not by today’s standards. People will try it, say, hey, this is not really all that useful, get turned off, do something else. But if you’re part of the select few, they can go pay $20 a month and get to see GPT-4. And you can realize, wow, this is progressing really fast. I think the same thing is going to happen for GPT-5. It’s going to come out. Silicon Valley will be wowed. I will be wowed. And most people will say, hey, the model that’s free, that I can use, is not any better.
The world is not moving very fast here. I think we’re going to be eventually at a point where we have human level intelligent models. People still really don’t notice, because they’re not affordable, and they’re not part of their daily lives.
The big misconception I see is that these AI models aren’t reasoning and aren’t smart. I think they absolutely are. I think that before you make that judgment, you should go try the ones that are good.
Copy LinkWhy the public still underestimates how capable AI models have become
EL KALIOUBY: Yeah, you know, I also think you’re kind of touching on a very important point, which is the models we see today are the worst they’re ever going to be, right? All these things, they’re going to keep getting better and better and better very fast.
UBERTI: A lot of people look at the GPT-4 level models and say, this is a bubble. This is not human level intelligence. This thing is good for reading emails. And wow, the goalposts move really far, really fast. Two years ago, it was a miracle that these things could speak English at all. And now we say, ah, but it still gets some medical questions wrong. It is still less reliable than the best doctors on earth. And I guarantee when the GPT-5 models come out, the goalposts will move again.
EL KALIOUBY: Yeah. Love it. Amazing. Well, thank you, Gavin, for joining us on the show. This was awesome.
UBERTI: It’s a pleasure to be here.
EL KALIOUBY: These recent developments in AI have only led to even more demand for this technology. But to build commercially viable AI solutions, we need compute to be faster and cheaper.
And we need to be thinking about how we can make AI that is sustainable and accessible at every level of the stack, starting with the chips. Cheaper chips mean that start-ups who may not be able to afford or even have access to larger GPUs can get a piece of the pie. Plus, more energy efficient hardware means less environmental impact on our planet.
All of this means that there’s so much opportunity to innovate in this space – the same way that Etched is doing.
Episode Takeaways
- Rana el Kaliouby opens with the high-stakes AI chip race, then introduces Etched founder Gavin Uberti, who argues the only real way to challenge NVIDIA is through transformer-specific specialization.
- Gavin shares an unconventional path from Harvard math student to globe-trotting engineer, where writing low-level assembly code convinced him that chip design wastes too much space on flexibility instead of pure AI performance.
- That insight led to Etched’s central bet: transformers have become the dominant AI architecture, so a purpose-built ASIC can dramatically outperform general GPUs on speed, throughput, and inference cost.
- Uberti says Etched is really selling full AI infrastructure, not just chips, with its SOHU system designed to slash time-to-first-token and unlock demanding use cases like real-time video, audio, and advanced agents.
- By the end, the conversation turns to the bigger mission: making frontier AI affordable beyond Silicon Valley, preserving language diversity through sovereign AI, and lowering costs so powerful models reach billions, not just premium users.