World Labs is a startup founded in 2024 by Fei-Fei Li, a famous AI expert and professor at Stanford University, dedicated to developing the next generation of AI systems with "spatial intelligence".
Since its establishment, World Labs has completed two rounds of financing, raising a total of approximately US$230 million. Major investors include a16z, Radical Ventures, NEA, NVIDIA NVentures, AMD Ventures and Intel Capital. The company's valuation exceeded US$1 billion in just three months, making it a new unicorn in the field of AI.
Recently, Fei-Fei Li had a conversation with two partners of a16z, Martin Casado and Eric Torenberg. For the first time, she publicly talked about the concept construction, research direction and grand vision behind their co-founding of World Labs. The past and present of a16z platform strategy: from VC "unwilling to clean up" to "full stack service".
Fei-Fei Li pointed out the core point of this conversation at the beginning: "I don't need a large language model to convince me, the world model is the really important direction."
She emphasized that spatial intelligence - whether it is the three-dimensional physical world we live in or the imagined digital universe - is an indispensable part of intelligence. And today, we finally have the ability to generate and reconstruct these universes.
▍Intelligence older than language: spatial perception and 3D reconstruction
Fei-Fei Li pointed out that compared with language, spatial perception is a more ancient and instinctive ability in the process of human evolution. She shared a personal experience: a few years ago, she temporarily lost her stereoscopic vision due to a corneal injury. During that time, she did not dare to drive alone. Even on familiar streets, it was difficult to judge the distance from the car next to her.
This experimental experience made her deeply realize the basic role of the 3D perception system in human actions. For AI, if it cannot build a 3D world model, it cannot truly understand, operate or reconstruct the real world.
Martin Casado added that the lack of this three-dimensional intelligence is the key reason why robots and embodied intelligent systems have been slow to land. He used a popular example to explain: if you take a person into an unfamiliar room, blindfold him, and only use language to describe the space, and then ask him to complete a task - it is almost impossible. Once the eyes are opened, the brain can automatically reconstruct the spatial model and complete the action. This reconstruction ability is completely lacking in the current mainstream language model.
▍The technical critical point from NeRF to the world model
Talking about why she chose to establish World Labs at this time, Fei-Fei Li believes that this is the result of long-term academic research and industrial foundation accumulation.
She recalled that as early as four years ago, a research breakthrough called NeRF (Neural Radiance Field) had opened up a new path for 3D visual modeling. The proposer of NeRF is Ben Mildenhall, one of the current co-founders of World Labs.
Another founder, Christopher, conducted pioneering research in efficient 3D representation, promoting the return of volumetric 3D modeling in the industry.
Together with Justin Johnson, who applied GAN technology to image style transfer in the early days, these scattered research results can now be integrated into the same team, focusing on a "North Star" goal: to build AI's world model capabilities.
Martin attributed this goal to the deep integration of two systems: one is the AI model, data and architecture itself, and the other is the engineering system of graphics rendering and space reconstruction. The ability to enable experts from these two worlds to collaborate efficiently on a single platform is itself an important organizational innovation in the technology industry.
▍The language model is not the end, but the prologue
Li Fei-Fei emphasized that her belief in the world model did not come from disappointment with LLM, but from a deeper understanding of the nature of intelligence.
She pointed out that language is a "lossy compression" way of cognition, which abstracts the world but also loses rich physical and perceptual information. The real world has no words, grammar and text, only physics, movement and three-dimensional structure.
This view also changed her perception of the form that AI companies should have. She turned from a Stanford professor to an entrepreneur because she realized that academic research alone is far from enough to achieve the modeling of spatial intelligence - it requires industrial computing power investment, system-level architecture scheduling and the collaborative ability of top cross-border talents.
All of this can only be truly implemented in a company with a highly organized level and outstanding full-stack engineering collaboration capabilities.
▍Spatial intelligence applications far exceed robots
For most people, "world model" is still an abstract scientific research term. But Fei-Fei Li and Martin jointly pointed out that its applications far exceed autonomous driving and robots.
Creativity is essentially visual. Industrial design, filmmaking, architectural composition, and even game development all rely on three-dimensional construction and manipulation. If AI has world model capabilities, it can not only "understand" the three-dimensional world, but also "generate" and "operate" virtual space.
Martin described that with just a photo of a table, the model can infer the shape and material behind it, and then build a complete spatial scene. On this basis, users can even measure, add, delete or redesign the space. This is a more intuitive and free way of human-computer interaction than text instructions, and it also opens up a new dimension for design, creation and simulation experiments.
Li Feifei further proposed that digital space is bringing an unprecedented opportunity for change: "Humans have only lived in a three-dimensional physical world so far. But the digital world will allow us to enter the 'multiverse' for the first time."
She listed several examples: some universes are built specifically for robots, some universes serve human creativity, and some are used for telling, communicating and experiencing travel. These spaces that once existed only in imagination will now be truly generated and understood, used and transformed by machines.
▍The next battle of basic models, 3D panoramic modeling
Back to the technology itself, Fei-Fei Li emphasized that World Labs is not only to create an AI that "can see", but also to make AI understand the 3D structure, dynamics and combinatorial logic of the world. This is not only a more difficult engineering problem, but also a new philosophy of representation.
She believes that scientific discoveries such as the double helix structure of DNA and buckyballs are the crystallization of spatial intelligence. It is impossible to deduce such geometric structures purely by language. This is why the world model can not only improve the understanding ability of machines, but also open up new creative paths for human science and art.
Martin concluded that the revolution brought about by LLM proves a fact: when we find the right data structure and model representation, the ability of AI will be improved exponentially. Now, they believe that the "world model" is standing at a similar critical point.
▍The key to understanding and building the world
"We are actually walking backwards on the road of evolution." When Martin put forward this point of view, the whole conversation also went to the philosophical level.
Language is one of the latest modules in the evolution of the human brain, while the spatial perception system has existed since the arthropods, which is 500 million years ago. Today's AI, if it only "learns language", cannot really be called "understanding the world". Only by building a human-like space model can AI truly enter the door of "embodied intelligence".
Fei-Fei Li concluded with her usual firm tone: "I have been waiting for this day. It's not because I don't believe in language models, but because I know very well that the real world is not made up of text."
And the world model is the key for AI to truly understand and build this world. From I/O to iO, Jony Ive will promote a new design movement - AI is rewriting the computing paradigm and hardware definition, and it is also a new battlefield after the big model.