Scale AI CEO: Our data engine generates 'nearly all the data necessary to fuel' leading AI models
Fundraising in the AI sector is moving at a dizzying pace. $19.4 billion raised in 2024, Four so far. That's according to crunch based data. Some of the latest high profile examples Anthropic and Core Weave fetching nearly $20 billion evaluations. Cognition, a company without any meaningful revenue and a product less than a year old reportedly looking to be valued at $2 billion. The latest is Scale AI announcing a new funding round today, raising a billion bucks at a valuation of nearly $14 billion. The company helps train AI systems across various industries with customers ranging from the US Army to Microsoft and Meta to other AI companies like Open AI. Joining us now in an exclusive interview is Scale AI founder and CEO Alexander Wang. Scale was named just named to our CMC Disruptor 50 list in the number 12 position last week. Alex, it's great to have you on the show. I do want to start with this latest funding round. It it nearly double S the valuation of the company. What does this capital enable now? Yeah well we're incredibly excited and and great to see you. You know we're scale is the trusted data foundry for AI. Our data engine generates nearly all of the data necessary to fuel the leading large language models in the industry today. And looking forward you know we the AI industry has has very major requirements to continue scaling and having the impact that I think we all know it deserves. And if you take a if you zoom all the way out, AI boils down to three pillars. There's data, compute and algorithms. Folks like Open AI and others help solve the algorithmic problems. Folks like NVIDIA helps solve the compute problem. And our role in the ecosystem is to help solve the data problem for the rest of the industry. Our vision is that one of data abundance. You know, we think we need to ensure that we have more than enough data and the means of production of continue producing more data to scale frontier large language models and frontier AI systems to be incredibly performing and scale us all the way to AGI. It's interesting to hear you say this because we talk a lot about the cost to train these large language models, We talk about possibility of regulation, we talk about infrastructure and power needs all of these as possible roadblocks in terms of broader more aggressive adoption of generative AI. Are you arguing that data could actually be one of those things that holds up that broader adoption? Exactly right. I mean I think if we do nothing and if you know at scale we don't continue innovating, we're likely to face similar bottlenecks in data like the ones that we see in computational capability and chip production or power or data center build outs. You know, these are all the fundamental bottlenecks in the supply chain of AI development and deployment. And so you know our mission, our goal is to build this data foundry to increase the means of production, basically increase the capacity of the overall AI supply chain to ensure that data doesn't become one of those bottlenecks. I think even more than that, you know, we sort of think that there's three principles that have to uphold into the future of AI data for AI systems. The 1st is data abundance. I think we need to make sure that we have an era of data abundance rather than data scarcity. The 2nd is frontier data. I think we need to make sure that we're continuing to produce data that's sufficiently complex, sufficiently advanced, that's able to actually push the boundaries of current AI capabilities towards complex reasoning agents, multi modality, multi linguality and more. And then the last is measurement and evaluation. I think that to properly build confidence into these AI systems, we need to make sure we have an evaluation system that actually enables us to constantly measure the capabilities to ultimately Dr. further adoption and scale impact.