CoreWeave’s Brannin McBee on the future of AI infrastructure, GPU economics, & data centers | E1925

Episode Summary

In the podcast episode titled "CoreWeave’s Brannin McBee on the future of AI infrastructure, GPU economics, & data centers," Brannin McBee, the Chief Development Officer and co-founder of CoreWeave, discusses the rapid evolution and challenges of AI infrastructure in response to the burgeoning demand for AI technologies. CoreWeave, a company initially rooted in the cryptocurrency mining sector, has pivoted towards providing specialized cloud infrastructure for AI and parallelizable workloads, distinguishing itself from traditional cloud services designed for serializable tasks. McBee highlights the company's significant growth, with plans to expand their data center operations across North America, underscoring the insatiable demand for AI compute resources. The conversation delves into the economics of GPUs, particularly NVIDIA's H100s, which are central to AI model training and inference. McBee explains the capital-intensive nature of setting up and operating data centers equipped with these GPUs, emphasizing the importance of efficient power use and innovative cooling techniques to manage the substantial heat generated by these systems. Despite the high costs associated with this infrastructure, the demand from companies seeking to train AI models and perform inference at scale continues to surge, with CoreWeave's services being fully booked. McBee also touches on the broader implications of AI infrastructure development, including the potential for AI to revolutionize various sectors such as advertising and healthcare. He predicts that generative AI will play a pivotal role in creating highly personalized and effective advertising, leveraging vast amounts of data to tailor content to individual users. Furthermore, McBee envisions AI contributing significantly to advancements in healthcare, although he anticipates this will become more pronounced in the latter half of the decade. Throughout the discussion, McBee underscores the challenges of scaling AI infrastructure to meet the explosive growth in demand. He points out the limitations of existing cloud infrastructure, which was not designed for the parallelizable workloads that AI requires, and the necessity of building new, specialized data centers. The conversation concludes with McBee expressing optimism about the future of AI and its capacity to drive innovation across various industries, despite the logistical and economic hurdles that lie ahead.

Episode Show Notes

This Week in Startups is brought to you by…

OpenPhone. Create business phone numbers for you and your team that work through an app on your smartphone or desktop. TWiST listeners can get an extra 20% off any plan for your first 6 months at http://www.openphone.com/twist

Gusto is easy online payroll, benefits, and HR built for modern small businesses. Get three months free when you run your first payroll at Gusto.com/twist.

Northwest Registered Agent. When starting your business, it's important to use a service that will actually help you. Northwest Registered Agent is that service. They'll form your company fast, give you the documents you need to open a business bank account, and even provide you with mail scanning and a business address to keep your personal privacy intact. Visit http://www,northwestregisteredagent.com/twist to get a 60% discount on your next LLC.

*

Todays show:

Brannin McBee joins Jason to discuss CoreWeave’s transition from cryptocurrency to AI (3:36), the energy dynamics of GPUs (14:04), innovations in data center cooling (19:28), and the future challenges and impacts of AI infrastructure on industries like search, advertising, and e-commerce (45:36).

*

Timestamps:

(00:00) CoreWeave’s Brannin McBee joins Jason

(3:36) The founding story of CoreWeave and the transition from the cryptocurrency market to AI

(12:49) OpenPhone - Get 20% off your first six months at http://www.openphone.com/twist

(14:04) Energy reliance of GPUs, the future of data centers, and the cost and efficiency of GPU cloud infrastructure

(19:28) Direct-to-chip liquid cooling and other cooling solutions for GPUs

(24:52) Demand trends for GPU capacity and the growth of inference linked to user growth

(29:06) Gusto - Get three months free when you run your first payroll at http://gusto.com/twist

(30:28) LPUs Vs. GPUs

(34:13) Dominance of Nvidia in training models, implications of open source chips and chip architecture

(39:01) Northwest Registered Agent - Get a 60% discount on your next LLC at http://www.northwestregisteredagent.com/twist

(40:00) Challenges in training large language models and the use of infrastructure as a weapon by big tech

(45:36) The potential impact of AI on search engines, advertising sector, and ecommerce

(52:46) Timeline for supply-demand balance in AI infrastructure and the operational feat of running multiple data centers

*

Subscribe to This Week in Startups on Apple: https://rb.gy/v19fcp

*

Follow Brannin:

X: https://twitter.com/branninmcbee

LinkedIn: https://www.linkedin.com/in/branninmcbee

*

Follow Jason:

X: https://twitter.com/Jason

LinkedIn: https://www.linkedin.com/in/jasoncalacanis

*

Thank you to our partners:

(12:49) OpenPhone - Get 20% off your first six months at http://www.openphone.com/twist

(29:06) Gusto - Get three months free when you run your first payroll at http://gusto.com/twist

(39:01) Northwest Registered Agent - Get a 60% discount on your next LLC at http://www.northwestregisteredagent.com/twist

*

Great 2023 interviews: Steve Huffman, Brian Chesky, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarland

*

Check out Jason’s suite of newsletters: https://substack.com/@calacanis

*

Follow TWiST:

Substack: https://twistartups.substack.com

Twitter: https://twitter.com/TWiStartups

YouTube: https://www.youtube.com/thisweekin

Instagram: https://www.instagram.com/thisweekinstartups

TikTok: https://www.tiktok.com/@thisweekinstartups

*

Subscribe to the Founder University Podcast: https://www.founder.university/podcast

Episode Transcript

SPEAKER_00: You look at the existing cloud infrastructure that was built over the last decade, it was built for serializable workloads.It wasn't built for parallelizable workloads. workloads.And it's like you're having to rebuild the cloud, so to say, and you're having to rebuild physical infrastructure at the pace of AI software adoption.It's a mind blowing concept, right?Because AI software is being adopted at the most rapid scale of any technology that we've ever observed.I mean, we're building at, I think it's 28 data centers this year across North America.We're one of the largest operators of this infrastructure in the world.And we are unable to keep up with demand. And we really don't see that subsiding for years to come. SPEAKER_02: This Week in Startups is brought to you by Open Phone.Create business phone numbers for you and your team that work through an app on your smartphone or desktop.Twist listeners can get an extra 20% off any plan for your first six months at openphone.com slash twist. Gusto is easy online payroll, benefits, and HR built for modern small businesses.Get three months free when you run your first payroll at gusto.com slash twist. and Northwest Registered Agent.When starting your business, it's important to use a service that will actually help you.Northwest Registered Agent is that service.They'll form your company fast, give you the documents you need to open a business bank account, and even provide you with mail scanning and a business address to keep your personal privacy intact. Visit northwestregisteredagent.com slash twist to get a 60% discount on your next LLC. SPEAKER_01: All right, everybody, welcome back to this week in startups.We've got a great guest for you today.You may have been wondering who's buying all of these NVIDIA H100s and how are people getting access to all of this hardware?Well, there's a couple of companies. that got to the hosting of AI and GPUs early.One of those companies is CoreWeave.They started in the space, I believe, doing a lot of crypto, where miners were renting GPUs from their cluster.And... Fortune favors the bold, they were in the catbird seat when the AI revolution happened, and everybody decided, well, they got to train their own models are going to need a bunch of NVIDIA's hardware and other people's hardware.And we'll talk about that today. And they've since grown to a massive scale.For those of you who don't know, NVIDIA's market cap has increased more than 7x from 300 billion to 2.3 trillion if you've been living under a rock, and you haven't been watching this.It's because people want access to these chips.Welcome to the program, Brannon McBee, who is the CDO and co-founder.What does CDO stand for? SPEAKER_00: Chief Development Officer.So my role is raising capital for the business.I interface with equity and debt participants for the company and help fuel the growth of the business. SPEAKER_01: And this is a very capital-intensive business.You spent a lot of money on GPUs and setting up infrastructure.The company's been around for just under a decade, am I correct? SPEAKER_00: Yes, we founded the company in 2018. SPEAKER_01: Got it.And am I also correct that you were supplying GPUs largely to crypto and Bitcoin miners and this cohort of individuals? SPEAKER_00: Or that was the beachhead market?It was absolutely the beachhead market.So I can take a couple steps back on our founding story.So it's myself, my two co-founders.We're from the institutional commodity trading sector.So we're risk managers by background.We're from hedge funds.Finance guys is probably the best way to look at it.But We were finance guys who were heavily data-oriented. We worked in this commodity sector that you can actually solve for price.You can figure out supply-demand and... There is a dollar per barrel of oil, so to say, that solves market dynamics.So we've always worked with a lot of compute.We've worked with a lot of software.And the crypto space was interesting because it was an arbitrage opportunity.There's a very discrete input price cost of power. And you could model the revenue very efficiently because there was no customers, right?You were just participating in this network.And thus, if you were just to sell the revenue from the crypto mining proceeds every day, you could effectively qualify it as an arbitrage opportunity. And that was interesting to us.But it wasn't as compelling as a large business, because at the end of the day, all you're going to do is chase the price of power lower.That's the only advantage you can really extract, unless you expand into other markets.So when we were looking at the cryptocurrency space, it wasn't Bitcoin mining that we were interested in.It was... Ethereum mining and GPU-oriented mining, because a Bitcoin miner, ASIC, things that are produced by entities like Bitmain, they can only do that one thing.They can only participate in Bitcoin, and they're very good at it.But a GPU, well, it can do lots of things, including running AI workloads.So- We started in the crypto space, but it was always with this idea. And we had no idea how complicated an idea it was at the time, but started with the idea that, well, you could do crypto and other things.Right.Started there. SPEAKER_01: The other use of these is, I guess, running video games in the cloud.Is that correct?Yes.Cloud video games become a real market or do people who are into video games just buy themselves an Alienware, Dell, whatever, and be done with it? SPEAKER_00: I think it's more the latter, that they use it more for that.We certainly don't see that demand for video game streaming.I think a few of the hyperscalers tried to launch into that market, and I don't believe that there has been a substantial demand. SPEAKER_01: Got it. And so when we look at crypto, that market kind of fizzled right as AI was starting to boom.So you were able to sort of just navigate that?Or is the crypto still going on and people are still using your services for Ethereum and being part of that network?Or is that just... too hard of an arbitrage now because people in china have stolen electricity we hear or you know their friend runs the hydro dam so they run a extension cord so to speak over to their warehouse with a bunch of servers in it and you're up against people getting zero cost of input electricity it's a great question and yes i'm extremely excited that we're not involved in that cryptocurrency market anymore we haven't been involved for a number of years at this point um SPEAKER_00: We actually started making this transition into the cloud infrastructure market in 2019.We hired Peter Selenki, who was recently elevated to our chief technology officer position in 2019, to build a cloud for us.And it's not just plugging in GPUs and having users come access it.It's this... really complicated software stack that runs the cloud.Or in other words, it's an orchestration environment that enables users to access and use our infrastructure.And it's one that it's very different than the way that the hyperscalers built it because they built for hosting websites and storing data lakes.And we built our cloud from a no compromises engineering solution for running AI workloads and highly parallelizable workloads.And there's engineering decisions you make in doing that, that you wouldn't make for hosting websites. And that's allowed for us to, I would say, overperform in the market with a product that really doesn't have competition.We started in 2019. SPEAKER_01: Yeah, the software layer to provision these H100s, A100s, whatever people are using, that's a key part of the puzzle that you have to build, you have to master. And AWS's, Google's, Cloud, everybody's, Azure, they're all slightly different in using their own provisioning software, or is there some open source standard there for doing all that? SPEAKER_00: That's exactly correct.We use and contribute to open source as much as we can, but we have a proprietary orchestration solution that looks different than the hyperscalers do.My favorite analogy for this actually comes out of the automobile sector. where at the end of the day, everyone produces vehicles the same way, right?From research, design, scaling, servicing, it's the same sort of product with different badges and different colors on it, right?And it's been that way for 60 plus years.And then in the 2000s, a company came along and said, well, what if we started with a blank slate and designed this process today?And, you know, ultimately... Ford might have to produce vehicles like Tesla does, but I think we can all appreciate the foundational difference in the way that those vehicles are brought to market and the challenges that Ford will have to go through to get there.And I think it's a lot of the same for the hyperscalers, right? I'm not going to tell you a trillion dollar company with tens of thousands of engineers can't do what we do. But I will highlight the innovator's dilemma that sits there because there's an existing product.You have to change everything underneath to run infrastructure like we do.And it's going to be hard to get there. SPEAKER_01: And these, if we go through the economics of, let's just say, an H100, this is NVIDIA's state-of-the-art... It's not just a GPU, it's a rack, essentially.It's a platform to put many GPUs on.I'm not sure exactly how many the H100 holds, but it holds a number of GPUs.They go for like 30, 40 grand each, is my understanding. SPEAKER_00: So, it's a... It's a server or node, as we call it, within a network fabric.And a server has typically eight GPUs within it.And then you put those into a cabinet and you put those into a data center and bring power into the data center and you connect it to the internet.And then, yes, those things, that price range was accurate for each one of those GPUs in a server.So a server can cost upwards of a quarter million dollars. SPEAKER_01: And so you rent out one of those H100 GPUs, an individual GPU in a server like that, a cluster, a node for four bucks an hour, something to that effect, yeah? SPEAKER_00: Yeah, yeah, that's right.Where we specialize though is doing it at scale, right?Like we don't have many clients who just use one at a time.Our clients will use 10,000 at a time and a single contiguous fabric which makes it a supercomputer.And it's interesting.This actually has become where we operate some of the world's largest supercomputers at this point.I think several of the top 10 now sit on our platform because of how large these fabrics are and how performant these GPUs are at these specific tasks. SPEAKER_01: So somebody fires up 10,000 of those.I'm assuming they get some kind of volume discount.So if it was $2 or $3 an hour, they're spending $20,000, $30,000 an hour on a job at one of those, correct?Something in that range? SPEAKER_00: Yes, but I will correct the discount because it actually works inverse, right?Because it's extremely surprising, but it's because... Building a single fabric of this size is so engineering intensive that not many people are able to do it. There's not a template.Not a lot of companies have gone out and done it.It's actually maybe three or four on the planet who are actually building fabrics of this scale.So you actually make a more scarce resource through scaling.You're kind of like decommoditizing the market through scale.It's not one GPU or 10,000 GPUs.It's, oh, 10,000 GPUs. That's a totally different engineering solution. SPEAKER_01: juggling multiple devices and apps to run your business is a mess open phone is here to make it simple by simplifying your business communications with one easy to use app open phone has rethought every detail of what a modern business phone should be and here's the magic it works through a beautiful elegant app on your phone or you can just use it on your desktop making it super easy to get a business phone number for your entire team.And you know how brilliant open phone is my teams use it every single day.My sales team loves it.My ops team, they use it all day long.And here's the features that we love.You can create a shared phone number like customer support with multiple employees fielding all the calls and all the texts to that one number at my investment firm launch, we pride ourselves on replying to every single call or email instantly.And open phone is the number one rated business phone on G2 for customer satisfaction.So here's your call to action. Super easy.Open phone is already affordable starts at just 13 bucks a month, but twist listeners get an extra 20% off any plan for the first six months at open phone.com slash twist.And if you have existing numbers with other services, no problem.Open phone is going to port them over easy peasy lemon squeezy, no extra cost. head over to openphone.com slash twist to start your free trial and get 20% off.And how much of that costs, when we look at a $4 an hour cost, would you say energy is?Because these things are extremely energy reliant, right?They consume a lot of energy.So I'm curious how much of all this is energy and then where do you put your... This will take us down the energy rabbit hole, but where do you put your data centers and then where are data centers going to be in the future?Because these GPUs are taking... a multiple of what CPUs would use, yeah?So maybe you could explain that to us, yeah. SPEAKER_00: Yeah, it actually causes, you know, a pretty substantial bottleneck that exists in the market now is data center capacity, right?And it's not square footage of data centers, it's Data centers that have enough power brought into them and adding more power to those data centers.And arguably, that's where the next bottleneck in this cloud infrastructure, GPU cloud infrastructure market sits, is how do we access enough data center space to accommodate the volume of demand that's coming in?So power, it's roughly about 10%. of our cost to deliver this infrastructure.The infrastructure itself is actually where most of the cost sits from a depreciation perspective.Depreciate over a six year life on the infrastructure.The power side is, that's my background, right?It was in power markets and trading different electricity markets. So it consumes a lot of power.It has a immense amount of efficiency over CPU infrastructure. as well for what it's doing there.To run the same workloads on GPU versus CPU, it's actually more power efficient to run it on a GPU.If you're trying to achieve the same outcome, you'd have to use so many CPU cores to get to the same solution. So yes, they consume more power on a density basis, but on a workload basis, they're more efficient. SPEAKER_01: And this is, I mean, it's staggering.I was looking at one study that said like one of these GPUs at 60, 70% capacity year is like the average American household's energy consumption.And that's just one of them.So this would be the equivalent of like If somebody's using 10,000 of these, or I think Zuckerberg's going to be using low millions of these, it's like putting a million households online or something to that effect, yeah? SPEAKER_00: Yes, it's an immense amount, but it's also a transformational technology, I'd say, that we're looking at.And its ability to unlock value from data is something that we've never observed before. SPEAKER_01: um yeah so that speaks to justifiable like yeah i mean i'm not even i'm not even looking at this judgmentally like is it worth the amount of energy it's consuming i was looking pragmatically where let's assume it is where it comes from yeah let's say it's going to cure cancer it's It's going to find solutions for renewables or fusion that like we didn't even conceive of or the gains from it will be so extraordinary will obviously pay for itself and create an energy independent future.But what's happening in the industry today as people are buying these and looking for places to store them, you're looking to build up your infrastructure. Are we just out of energy?And where are the nooks and crannies where people are looking to locate these facilities?I heard that nuclear power plants are going to become like a place where people put these... The plan is to put a nuclear power plant and these GPU data centers next to each other. SPEAKER_00: Is there any truth to that?Yeah, look, I believe that was Microsoft or Amazon is effectively taking that nuclear plant and citing a data center next to it.Power, I think, you know, beyond data center space, right, is a national concern, I'd so to say.There's been an immense amount building out renewable capacity over the past decade, which is fantastic, but it's also not necessarily the right kind of capacity, what you need for consistent demand growth.As you know, solar works when the sun's out, wind works when the wind blows.Neither of those things work for a data center or even necessarily for electric vehicles, for all these kind of demand areas.We need more baseload power. That's traditionally come from coal and natural gas. Fortunately, it's been more so from natural gas over the last decade because coal is quite dirty from an emissions perspective.And my personal hope is that it's more nuclear. going forward, but it takes time to build nuclear sites, right?So I think it's a decade for, you know, siting and build and... In the United States, and we haven't built one in a long time. SPEAKER_01: I mean, I think the last one broke ground in the late 60s or early 70s, and we haven't had one since.So this would lead one to believe that somebody with a lot of nuclear power and a lot of GPUs would have a massive advantage. SPEAKER_00: It certainly helps.We've contracted a substantial amount of capacity for looking to ensure the growth profile of our business, but it is going to be a bottleneck for all other participants in the market. SPEAKER_01: What about heat?These things throw off a lot of heat.And some areas in the country are warmer than others.Are people moving these data centers north in order to get the cold air?We've seen pictures of data centers that have open sides where cold air just blows right in or open doors, essentially.Because in other places, if you were to put these GPUs in Texas, I think you're going to be air conditioning them. which seems doubly inefficient.So maybe talk a little bit about the heat these things generate today and if there's any hope of cooling them down without air conditioning. SPEAKER_00: Yeah, so it's a great question.It's funny, like it takes me vividly and visually back to my crypto mining days where we did run those warehouses with the open sides and the giant fans, and we were up north, could never run this infrastructure in those environments.From a security perspective, from a reliability perspective, it is mandatory to run this infrastructure in what's called a tier four data center environment, or sometimes even a tier five data center.And that's the highest classification in terms of reliability, redundancy, security, and environmental handling, right?So these are sites that you would see like Amazon or Google or Microsoft running within, right?Like true data centers that are meant for cloud infrastructure.And the way to think about the heat output The other variable in there, which is wild, is actually the sound. They're extremely loud in these environments, upwards of 100 decibels, which is a direct derivative of the heat they're consuming.And then to move the air.But the way you look at it is the critical load around the infrastructure, right?So if it takes one unit of energy to run the infrastructure, it takes another 0.2 to 0.3 units of energy to cool the infrastructure and run the networking and everything else around it.The way that The world's leading sites handle this is just through forced air, right?Just move tens of thousands of cubic feet per minute of air through these highly contained pods.So you have all the hot air in a really small area and you're just jamming air through it, right?And sometimes it's conditioned, sometimes it's just air, but eventually it's going to be liquid. We're going to move to an environment where you have direct-to-chip liquid cooling instead, and that efficiency ratio, call it 1.3, will drop to about 1.1. So your GPU infrastructure will inherently become more energy efficient as we move to a liquid-cooled environment.And we're working with leading data center operators such as Switch to facilitate and implement that movement for these upcoming generations of GPUs. SPEAKER_01: When people say liquid cooled, most people have not actually physically seen that. SPEAKER_00: Yeah. SPEAKER_01: Unless maybe you're a gamer and you've seen your chip in a tube run to the chip and there's literally liquid on the top of the chip. that's cooling it?How do these?When you say liquid cooled?What could people envision of how these solutions are going to work?Are they going to just be like a bunch of racks in an Olympic sized swimming pool?Or is it just like little contained amounts of water on top of the GPUs? SPEAKER_00: Yeah, so you're qualifying that correct.There's two broad categories of liquid cooling.There's immersion cooling, which is the Olympic swimming pool method, downsized, obviously.And then there's direct-to-chip liquid cooling, which is running the pipes to the chips.We will sit on the direct-to-chip liquid cooling side because it's operationally... more efficient for us um that that's where we think that the sector is broadly going to go if you think of immersion cooling right like you're literally dunking a server into a vata liquid that liquid has its own problems with it as well uh but let's say you had to go service that uh that server there's a node or component was wrong with it well you got to lift it out of the liquid And what happens then?Well, you got to wait probably an hour for all the liquid to drain out now before the tech can even get into it.So you're extending these service times materially and response times versus direct-to-chip liquid cooling.Pop it out and you don't have water containment issues. You don't have things splashing across data center.It's a mess.These are highly sterile and contained environments that don't even let us bring cardboard. inside of the data center area because it's combustible.And you can have little particulates that float around the site and can accrete into the nodes. bad.You don't want fires and data centers. SPEAKER_01: When you are talking about that much air being pushed around, that means any particulates in the air are going to get pushed around.And so if you just had some very small amount of particulates floating around a room, now imagine that room is changing the air every X amount of time. The number of particulates is going to grow, and then you're going to have a small fire on a chip, which is just absolutely crazy.Do you think the demand is going to keep up?Are you seeing any signs of demand?People saying, okay, we built our 10,000 GPUs.We're making more efficient software, making more efficient use of the chips.Okay, yeah, we're getting to a steady state. We've bought enough.So are you starting to see that with your customers saying, you know what, we've got enough GPUs, we've got enough infrastructure right now? Or are they still in the begging, pleading and doubling?What are you seeing from top customers?Are they doubling their capacity every year?Are they tripling?What's the field report? SPEAKER_00: Yeah, I'll qualify it a couple ways.So one way, we will increase our revenue by about tenfold this year. And we're already sold out of all of our capacity through the end of the year.So I have a build schedule.We have about 500 employees today.I'll be closer to 800 by the end of this year.That build schedule is fully booked this year already.We see that broadly across the sector.There's just an immovable wall of demand there. For this compute, a lot of it is being driven from this move from training the models to inference, right? And inference is actually bringing the commercial value out of training, right?So you want to go train a foundation model that takes compute to be built in the configuration that we build it. in these 10,000, 30,000 GPU clusters.And then you got to go make it actionable, drive revenue off it and bring a product.And what we're observing is it might take 10,000 GPUs to train a model, but inference is linked to the number of users.If you go into chat GPT, for example, and query, that's spinning up a GPU.And now there's a million of you doing it, 5 million, 10 million. that informs the size of inference.So inference will really truly be linked to the growth of this market.And we're seeing users who are using 10,000 GPUs for training need hundreds of thousands for their early stage inference products. So we don't see... SPEAKER_01: demand going anywhere but up to the right got it for this infrastructure while they may not need exponential use of uh you know gpus for training they'll get more and more efficient at that um and what they will need is those inference when people ask the query that's inference not training the model but asking a question of the model that is massively compute intensive and An H100, if we were to look at that unit in an hour at full capacity, how many queries, you know, I know it depends is always the answer, but an average query, like these things were costing a couple of pennies per query.Is that correct?Ballpark? SPEAKER_00: Yeah, that's correct.And I think that's the right way to qualify it is cents per query or dollars per query. So you're getting in hundreds of queries within that period.And as you said, that will become more efficient over time as well.But it's just an unbelievable volume of demand.And when you step back and think about it, you look at the existing cloud infrastructure that was built over the last decade, it wasn't built for this use case.It was built for serializable workloads.It wasn't built for parallelizable workloads. It's like you're having to rebuild the cloud, so to say, and you're having to rebuild it.You're having to rebuild physical infrastructure at the pace of AI software adoption. It's a mind-blowing concept, right?Because AI software is being adopted at the most rapid scale of any technology that we've ever observed.And you're asking people to build.I mean, we're building at, I think it's 28 data centers this year across North America.We're one of the largest operators of this infrastructure in the world. we are unable to keep up with demand.And we really don't see that subsiding for years to come. SPEAKER_01: Listen, as a founder, there are things I love doing, like building products or meeting with partners, hanging out with my team and dreaming up new ideas.And then there are chores that I don't want to do.I don't want to do HR.I don't want to do payroll.I don't want to deal with all that.So I use Gusto. Gusto is the best for payroll, for HR services, and for running a small business.It makes everything so much easier.Even a mid-sized business, man.I get a lot of portfolio companies that are pretty sizable using Gusto because it is designed for you, the small business owner. And payroll is something you definitely do not want to mess up.You've got to get it right.And Gusto is going to make it perfect for you by calculating paychecks perfectly.Also, payroll taxes.You've got to get your taxes right.You can't make mistakes there.And you want to set up open enrollment.You want to be good to your people.Gusto handles onboarding, health insurance, 401k, time tracking, commuter benefits, all the letters.And they even give you access to HR experts. So, Gusto takes all. all of this off your hand and let you focus on important stuff, your product and your customers.It's super easy to set up and get started.And if you're moving from another provider, Gusto will transfer all your data for you.Here's your call to action.Because you're a Twist listener and you're part of the family, you're going to get three months free.Incredibly generous.Totally unnecessary.Thank you so much to our friends at gusto.com slash twist. You must go to Gusto, again, gusto.com slash T-W-I-S-T to get free months free. Thank you, Gusto team.So then this would lead us to LPUs, obviously using a GPU, very expensive, right?But Grok, my friend Chamath's company, has this... inference engine and these LPUs, are you starting to see those and that hardware stack emerge, these language processing units?And do you think that'll have a good effect on the industry in terms of lowering costs and having purpose-built hardware for the inference moment? SPEAKER_00: Yeah, so as opposed to the Lord of the Rings, right, where there's one ring to rule them all, I don't think that there's going to be one GPU, LPU, one accelerator to rule them all, nor do I think there's going to be one model to rule them all either.I think there's going to be lots of different models with different objectives, like models that do different things, whether it's helping drive a car or cure cancer or be an AI character.Models will do different things.And then there will be infrastructure that is most efficient for each different type of model. And I think that's why you're seeing entities like Microsoft, Meta, et cetera, who are focused on building their own silicon.They're not trying to replace the GPU.They're just trying to solve for different models that they're running internally.So I think the Groks of the world will absolutely have a place somewhere.But I also think that you'll see GPUs have this place. What we're observing their place is at foundation models, at latest generation models, the most demanding and complex workloads will continue to sit on GPUs.And NVIDIA just has this unbelievable solution for iterating continually better generations of GPUs.And we think that those models will continue to accrete to NVIDIA's platform. SPEAKER_01: So they're going to win the day, no doubt, NVIDIA, when it comes to training the models.Inference, you might see other folks carving each for themselves is how you would bet this emerges. SPEAKER_00: Yeah, I think inference will have various levels of infrastructure that provide solutions for it.I will say, if a model is trained on A100s, it'll probably run inference on A100s. as well like like kind of tough to make that architecture shift and it's tough because of the software that nvidia has right like their driver solution cuda um nvidia very thoughtfully open sourced that driver solution in the early 2010s to to support this sector and the engineers who wanted to work on these products and it has become effectively a default solution across the market.It's similar to drivers for CPU.Everything was x86 for decades.And it didn't really matter if something was better or not than it.It's just what people use.Because there's an efficiency loss if you say, well, I'm going to go learn this other thing and just hope other people will use it.Or you could just use the thing that everyone else uses. And that's what dominates the market.And NVIDIA has an amazing moat that they've developed out of the superiority of their software solution for their infrastructure.And I think that's going to keep people using their platform for a long time to come. SPEAKER_01: Now, are people using CUDA yet to address other GPUs and because it's open source, and it's obviously being used for parallel computing here, when you've got a supercomputer, you need to, you know, send a job across many different GPUs?Are people for CUDA?Or have they adapted CUDA in order to, you know, have it send a job to some Intel servers, some NVIDIA ones?And is that opening up possibilities, I think, for more open source future?And then I'm curious, what you think of open source chips and chip architecture?And if you think that is ever going to have some sort of an impact here on the space? SPEAKER_00: Sure.So this will get a little bit beyond my domain expertise, but yes, there has been forks, so to say, and software that enables CUDA to run on different infrastructure, but it comes at a hefty cost, right?It comes at performance loss.It comes with configurability loss, like so much so that none of our clients are requesting that, right?We're talking, you know, 30, 60, 80% performance loss, right?So the most natural thing for the largest consumers of this compute is to stick on NVIDIA infrastructure with NVIDIA software, right? And that goes to your second question, which is around open source.It's tough for me to say, but I would highlight the behemoth that's driving the research and the path forward on NVIDIA GPUs.They just have so much capital they're putting to work to ensure that they have the most performant piece of infrastructure in the market that Sure, there might be some use cases for that open source infrastructure to be applied, similar to how Rock can be there or other custom silicon chips.I think the vast majority of workloads are going to accrete and stay with GPU infrastructure, of which... SPEAKER_01: Who's number two or three in the space?Does anybody have a chance of closing the gap?Because obviously people are watching NVIDIA print money, and obviously that's, I don't know what percentage of your infrastructure that you provide is NVIDIA, but I'm guessing it's 90% plus. So, but is there a number two or three in this space?And, you know, do they have a chance of gaining market share?Or do you think this is fait accompli?We're just, we're going to live in an NVIDIA world for the next decade. SPEAKER_00: I think we're in NVIDIA world for a while, right?You know, you have AMD out there, but AMD doesn't have a performant training fabric, right?Like that's something that's proprietary with an NVIDIA is in fit and band, right?So you can't build this a comparatively performant training fabric with AMD infrastructure.So you can kind of only use it then for inference.And it's sort of, well, if you've already trained your model on NVIDIA, it's a tough leap to want to move your software, move your infrastructure over to AMD compliant.So it's certainly a market I would expect that AMD is allocating their time to, but we're not seeing the customer demand for it. At scale, right?And we really service scale consumers of compute. Certainly, there's your guys who want ones or tens of GPUs out there who will say, oh, I'd love to work with MI300s.But it's those entities who want tens of thousands of GPUs that are sticking with NVIDIA, and we haven't really seen any deviation from that. SPEAKER_01: And for folks who don't know, InfiniBand is kind of a contemporary or a competitor to Ethernet or fiber in a data center.If you had a bunch of storage in one location or even in a GPU between GPUs, passing data between them, there has to be some way to move data from one cluster to the other.If they were passing training data or something, the speed at which the training data can get on the GPU to be processed, that is a bottleneck. And InfiniBand is the solution to moving large amounts of data.Am I correct in my description? SPEAKER_00: That's exactly right.That's exactly right.It's infrastructure that NVIDIA acquired and have integrated into their solution called the DGX solution.And it is the most performant fabric solution, other words, network solution, for this infrastructure for data throughput. SPEAKER_01: Hey, startups, you're a new company and you're looking to form your business.But navigating through a maze of hidden fees and legal jargon, it's complicated.It's going to eat up all your time.Well, Northwest Registered Agent will form your business quickly and easily.And it only takes 10 clicks. in 10 minutes, they provide you with a full business identity setup.That means they'll give you everything you need to start and to maintain your business.When you hire a registered agent to form your company, they take care of everything you get a registered agent service, a business address, their corporate guide service, a phone line, mail scanning, a free domain, a website and hosting Northwest registered agent makes the whole process transparent, quick and enjoyable. Whether you're setting up an LLC, a corporation, or a nonprofit, they've got you covered.Here's your call to action. For just $39 plus state fees, Northwest Registered Agent will form your company and launch your business in minutes.Visit northwestregisteredagent.com slash twist today.That's northwestregisteredagent.com slash twist today. Is this still one of the key challenges in terms of training large language models is the throughput of the InfiniBand or Ethernet solutions to just move the data around?This is the bottleneck over GPUs in many of these jobs? SPEAKER_00: Yes, it's critical to build with a non-blocking InfiniBand fabric.So non-blocking means that every component can operate at the same performance and efficiency as everything else.There's nothing blocking that performance. SPEAKER_01: No bottlenecks. SPEAKER_00: Yes, no bottlenecks.And it's really interesting because it's a physical engineering problem.So a 16,000 GPU fabric, which is about 2,000 nodes or individual servers with eight GPUs per server, it has 48,000 discrete connections that have to be made across the fabric.So you plug in InfiniBand into each GPU in the server, then that goes out to a switch and you're part of this fabric. right?Every connection has to be made correctly.And you're doing this with 500 miles of fiber optic cabling within that 16,000 GPU fabric.And we run a number of those of larger and smaller size, right?So we built a lot of these things, run a lot of fiber in our days, but it's a complex physical problem that no one's really been presented with before, right?This wasn't a problem when you were running with ethernet or, you know, hosting websites and storing data lakes.You didn't have to build fabric this way.It'd be one connection per server, not eight connections per server.And to be doing it in this contiguous, non-blocking fabric in a single footprint.And so it's just lots of new things that are happening at the same time with an immense amount of capital at risk and immense amount of capital that's being consumed. in the fastest paced technology environment that we've ever been in.And it's creating problems all over the market.And where we've found ourselves is having a software solution and a company that's only focused on these types of workloads.And accordingly, we accrete clients into our platform for having that best engineering solution and actually being able to deliver it to end consumers. SPEAKER_01: Yeah, and we've never seen at-scale companies like a Microsoft, like a Meta, like a Google.These companies are at scale.They have massive amounts of capital, which they can't deploy in M&A anymore, right?We have a framework in the West where you're not allowed to buy companies.And I made this point on All In a couple of months ago. Instead of like, if you were Apple, or you're Google, and you're sitting on tens of billions, hundreds of billions of dollars in cash, you can't buy Uber, Airbnb, you can't buy Coinbase, you're not allowed to buy even Figma for 20 billion, you can't even make a small purchase like that. without getting blocked.What's the next best thing you can do with that capital?You can build infrastructure as a weapon.And now you've got this massive infrastructure. Will you have jobs for it?I'm sure there'll be some.Will those jobs turn into commercial products?Some will, some won't.But it's a better use than sitting on the cash.Or it's a better bet.It's a better use of capital rather than trying to make a couple of points on it or buying back your shares.It feels like... gosh, if you have this infrastructure, you could have induced jobs, which is to say some crazy person on the meta team is going to be like, what if we did X?And having that infrastructure allows somebody with a crazy idea to then go give it a shot. and spend a million dollars running a job across this infrastructure, whatever the pro rata version of it is, and maybe they find something really interesting.Who knows what people are going to do with this infrastructure?You do.You're watching them.What is the interesting... jobs you're starting to see and use cases?I mean, some of it's public and some is private, so I obviously don't want you to betray anybody's trust here, but just what are people doing with this infrastructure that you find interesting when they come to you and they say, hey, we need a solution for this, or here's what we're building?What are some of the things that you think are most promising on certain verticals, sectors that are most promising? SPEAKER_00: Sure.So I think the areas where AI will be adopted first and fastest and do it at scale right because you always find like five users to do something right but how do you get five million users yeah it's going to be within products that the user doesn't have to learn something new it might not even be a new but right it just comes naturally to them it feels organic it doesn't require you know a new app to be somewhere it's integrated into existing products and i think that's largely going to be co-piloting Like various co-pilot solutions, not the name to one product, but just the idea that you're integrating AI into apps to assist a user with a pre-existing process.That's something that we're seeing scale up. right now.And the ability for those products to scale are limited by the amount of cloud infrastructure that's able to handle those users.Again, remember, each time you come in and query that co-pilot product, it's using a GPU.So cloud infrastructure inherently limits the pace at which those products can grow.And I think you've seen some products delayed even because there wasn't enough cloud infrastructure available to power their launch even. SPEAKER_01: Yeah, if you look at search engines like Bing, Bing kind of was doing the custom answers, you'd have to like click a second button to get it right, go to another experience, whereas some, you know, search engines powered by AI were doing it automatically, because they didn't have a large flow of it.If every single Google search resulted in a query to a GPU, they would actually bankrupt Google right now, because they have so many queries and a three or four cents extra per query. there's not enough infrastructure in the world to convert all of those queries today. SPEAKER_00: Look, it brings up a question of will AI be a tax or a margin expander for software products?I think some of them, it will be a tax, right?It'll become mandatory and they might not be able to drive incremental direct revenue off those products, but The other outcome, if you didn't integrate that AI at that tax, could be you lose users and you lose market share to someone else. SPEAKER_01: Right.If you look at search, that would be the perfect example.If Bing offers this to, you know, their four or 5% market share, they're kind of, they can lose money on it because they're, they're building that business.Whereas Google, it's their core business.They would, if they put it on all 90% and they start losing money, they could just flip their business upside down.Right. SPEAKER_00: That's right.And, you know, the other interesting point to that is, you know, Google might not even have the option to integrate AI if it doesn't have the infrastructure available at the volume that's required.And I think that's why you're seeing some companies, FedUp, Microsoft, throw so much CapEx into ensuring they have the volume of infrastructure necessary because having the compute at scale, going back to my point earlier, it decommoditizes compute.That in and of itself is a strategic advantage.So I'd say the other area... that I personally think will accrete AI rapidly is in the advertising sector. SPEAKER_01: Oh, really?I thought you were going to say healthcare or biology or something. SPEAKER_00: I agree.I think that's the second half of this decade thing that we're extremely excited about.I mean, I can't wait to... have infrastructure that directly supports the advancement of healthcare solutions.But advertising, I mean, think of the way that ads work.You throw an ad, you hope it reaches an audience, and then a subset of that audience will actually identify with it.It's probably a pretty small sliver of it.Instead, if you could use generative AI to create on-demands, always-on ads for people that are 100% specified to the metadata associated with that user, those are going to be much more highly effective.Here's the example. You live in Utah, you have a green kayak, and you're searching for a new Kia. Right.It's a blue Kia.And you've been looking at it for a few days.And instead of just receiving the general Kia ad of a gray Kia somewhere, you now get an ad that is a blue Kia with a green kayak on top driving through the desert in Utah to a river. You can get several different iterations of that until you go buy that Kia.That's an area where the user doesn't know that it's generative AI, but it'll be so accretive and disruptive to the advertising sector that it'll just be mandatory. for them to use it, right?Because the ads effectiveness will increase that much. SPEAKER_01: It's fascinating.You look at what happened with Meta.There was this idea that when they lost access to smartphone data, when Apple anonymized it, they would have a really hard time doing targeted advertising.It actually kicked them in the ass and made them implement AI.And they have now recovered and gone further, I think, in terms of personalization.You're exactly right.If it knows you have two kids... It's going to put in that Jeep Wrangler with your kayak on the top or whichever car it is, two car seats.It's going to show kids in it.And the message will have something about how great it is for toddlers or young kids. And here's the media center that puts the TVs on the back so they can watch Netflix.It's going to have so much information to customize the ad that the gap between the aspiration of the ad and the reality of your life is going to close, right?Because ads are aspirational. It's like Minority Report, if you remember.And everything goes back to Minority Report.The customization of the ads will be absolutely phenomenal to a level that... Yeah, it's like... Beyond creepy, it's just like mind-reading ads. SPEAKER_00: Yeah, and you've never had that before, right?And what's important there is it took X amount of resources to generate that one kind of mass media ad previously.Well, now, each time you have that iterative always-on ad, that's querying infrastructure. So the infrastructure demand for this new type of advertising will be voluminous.Yeah.It'll be more effective.And I think it will actually be better dollars spent in advertising, but it'll be an immense amount of infrastructure demand behind it.So I think Copilot's there today and scaling, but the next big thing in there to really scale will be within the advertising space. SPEAKER_01: Yeah, it makes a lot of sense.It's a huge business.And yeah, anywhere there's a lot of data, And a frequent transaction.I mean, that's just a great place for GPUs and this AI revolution to take part of because it's frequent.And there's a transaction.This is where like Amazon and how Amazon sells you stuff in Walmart and Target.My Lord, e-commerce in the last mile. It's already been impacted in a way.I mean, if you look at the amount of advertising revenue for Uber and Instacart, which are, you know, for Uber Eats, essentially, you know, that's like being in line at the checkout counter, and then they're doing a billion each, I think a year, roughly, and then Amazon might be doing 30 or 40 billion in advertising now. Like they're those three businesses, which are seemingly transaction based businesses, right?Shopping for groceries, food delivery, mobile transportation than Amazon buy anything.They're all becoming advertising businesses.It's like pure profit for them.It's going to be wild, the AI impact on those businesses. SPEAKER_00: And all roads lead back to generative AI, right?It's just all converging here, right?And it all needs this type of infrastructure.And that goes back to my point of like how much demand is there.We just don't see the path to resolve the amount of infrastructure that needs to be built for the demand that there is within, you know, at minimum the next few years yeah right there's just so much needs to be built because last generations clouds aren't designed for this and it's not like you're swapping out a ui and saying like oh you like tweak some software here and there and all of a sudden it works right it's no it's the foundational difference it's the tesla versus ford manufacturing process yeah and it's SPEAKER_01: CPUs are just never going to take these workloads.CPUs will be for serving up images or light work.It's not going to ever compete with this level of audacity.It's completely different.Any worries about overbuilding this infrastructure at this point?If we were going to start talking about a slowdown or a certain amount of infrastructure, is there somewhere on the chart that you start thinking... yeah, this is going to, we'll fill the demand five years out, 10 years out.Where do you think we have enough capacity enough, you know, and the supply demand becomes normalized.Right now it's abnormal, obviously.When do you think this normalizes? When do we catch up? SPEAKER_00: So between the infrastructure demand and the data center demand, right?So it's multiple components in here.And then, you know, it's all the infrastructure pieces that go into a data center. Then it's the power that goes into it.It's this really complicated physical stack to serve it.It honestly could be the end of this decade until you see this rebalancing of supply and demand.And not to say that that's overbuilding.That's just still on a heavy growth trajectory.That's just when infrastructure... may have had an ability to catch up to where demand is. And I geek out over this stuff because that's my background and my co-founders were all from this commodity trading sector where all we did was assess supply demand and understand physical disruption of commoditized markets.And that's exactly what we're looking at here. SPEAKER_01: Yeah. But these aren't commodities yet.You know, like H100s trade at a price, and I guess they're commodities, but they feel like a very resource-contrained commodity right now.So I guess they are commodities, yeah. SPEAKER_00: They're sort of, you know, if you think about like cloud infrastructure for hosting websites, right?It was fungible, right?It didn't really matter if you were on AWS, GCP, Azure to host your website.It all felt like the same thing.It was the same product to go host your website, right?Yeah. What's changing is that lack of fungibility.An H100 hosted at AWS is very different than an H100 hosted at CoreWeave because of the way that we run that infrastructure differently from a software perspective and the way we build it from a physical perspective.So that's the commoditization that did exist that's now being decommoditized through software and infrastructure disruptions. SPEAKER_01: Right.Yeah.It's amazing.What a moment.What a time to be alive.It's so much fun. it was absolutely fascinating to talk to you for an hour and I'll let you get back to, uh, racking and stacking.I'm sure you've got tons of H one hundreds and a one hundreds to unbox.I mean, just unboxing and racking stuff.I mean, you have hundreds of people doing that at this very moment. SPEAKER_00: Hundreds and semi semi trucks arriving to our 28 data centers across the U S um, it's, It's an operational feat.I think we're hiring 20 people a week right now.Yeah. SPEAKER_01: And these are like system operations people.These are high-level people to come in and configure this infrastructure.Well, massive success.And thanks for building out the infrastructure.Let's solve some huge problems.And we'll see you all next time in This Week in Startups.Bye-bye.