I've been fortunate to work on complex technologies spanning computer vision, natural language processing, multimodal AI over the past 3 years as an AIML PM in 100-150 people labs in San Francisco, US. However, as a result of this, I’ve heard spatial AI / spatial computing / multimodal LLMs / embodied AI etc used interchangeably to refer to a wide variety of technologies and sometimes, inaccurately.
True to a PM but somewhere like a VC (:')), this post is my attempt to demystify the complex topic of Spatial AI and establish some ground-truths through a market map.
Spatial Computing vs. Spatial AI
Spatial technologies, at its core, bridge the digital and physical worlds by giving machines the gifts of perception and presence—the ability to understand, interact with, and navigate space. This convergence represents one of the most significant technological shifts we're experiencing today, yet it remains poorly understood by many outside specialized research circles.
Spatial Computing
Spatial Computing is the broader technological framework enabling machines to understand and interact with physical and virtual spaces, including:
Hardware (sensors, displays, processors)
Software systems and environments
Interaction paradigms
Infrastructure bridging digital and physical worlds
Examples: AR/VR headsets, LiDAR sensors, 3D mapping tools, robotic navigation systems
Spatial AI
Spatial AI, a subset of spatial computing, focuses on the AI/ML components that allow machines to perceive, understand, reason, and interact with spatial environments. Spatial AI includes:
Computer vision algorithms that understand scenes
AI models that can reason about spatial relationships
Machine learning systems that can predict spatial behaviors
Foundation models trained to understand physical spaces
Neural networks that generate 3D environments
Simply put: Spatial Computing provides the technological foundation while Spatial AI provides the intelligence layer. For example, in case of self-driving cars:
Spatial Computing for AVs encompasses the entire technological stack: the sensors that perceive the physical world, the processing systems that handle spatial data, and the interface systems that translate decisions into actions, providing the hardware and software framework to operate
Spatial AI is specifically the "brain" of self-driving cars that processes this spatial data to create understanding of the physical environment, reason about spatial relationships, and make navigation decisions in real-world contexts, providing the intelligence needed by the vehicle to be truly useful
Spatial AI landscape
While the distinction between Spatial Computing and Spatial AI provides a foundational understanding, Spatial AI encompasses a diverse range of technologies. To simplify this, I’ve created a market map using a 2x2 matrix with the following axes:
Axis 1: Real World vs. Virtual World
This axis differentiates where spatial technologies operate and derive data:
Real World: Technologies that work with existing physical spaces and their digital representations i.e digital twins.
Virtual World: Technologies that create and navigate synthetic digital environments.
In essence, the distinction lies in the origin and purpose: Real World technologies engage with actual physical reality, while Virtual World technologies operate in computer-generated environments that do not necessarily correspond to physical-world locations.
Axis 2: Spatial Intelligence vs. Foundation Models
This axis distinguishes how AI approaches spatial problems:
Spatial Intelligence: Specialized computer vision/AI/ML algorithms optimized for specific spatial tasks, such as perception, reconstruction, localization, navigation
Foundation Models: Large-scale, multimodal AI systems capable of generalized spatial reasoning and generation.
This distinction separates task-specific AI (Spatial Intelligence) from generalized AI (Foundation Models) with broader spatial capabilities.
This framework yields four categories of Spatial AI:
1. Real World Spatial Intelligence
Technologies that perceive, understand, and navigate the physical world using specialized algorithms
Use Case: When a delivery robot navigates sidewalks to avoid obstacles while delivering food, it relies on real-world spatial intelligence.
Example technologies: SLAM (Simultaneous Localization and Mapping), LiDAR-based mapping, GIS, computer vision, digital twins.
Example companies & applications:
Autonomous Vehicles (Waymo): Cars that can "see" roads, detect obstacles, and navigate to destinations
3D scanning & mapping apps (Matterport, Polycam, Scaniverse): Apps that scan and create 3D models of your home
Robot warehouse workers (Amazon Robotics): Robots that navigate warehouse floors to pick and sort items
2. Virtual World Spatial Intelligence
Technologies that create, simulate, and interact within virtual environments.
Use Case: When you use a VR headset to walk through a virtual house model and rearrange furniture collaboratively, you're engaging with virtual world spatial intelligence.
Example technologies: 3D game engines (Unity, Unreal), VR spatial mapping, metaverse platforms, and AR/VR collaboration tools.
Example companies & applications:
Video game engines (Unity, Unreal Engine) : Tools that create 3D spaces for games
Virtual collaboration rooms (Meta Horizon) : Digital spaces where people can meet and work together
3. Real World Foundation Models
AI systems that understand and reason about physical spaces
Use case: AI analyzing a photo of your living room and suggesting furniture layouts.
Example technologies: Multimodal AI (vision-language models, embodied AI), Geospatial AI (Google Earth, Niantic LGM) etc
Example companies & applications:
Humanoid robots (Figure AI): Robots that can understand verbal instructions about physical tasks
AI perception models (Anthropic, OpenAI, Gemini): Multimodal models integrating text, images, and spatial data for reasoning
Geospatial AI (Niantic LGM, Gemini with Google Earth, Atlas AI): Models that understand satellite imagery and create intelligent extrapolations of real locations, describe what's happening
4. Virtual World Foundation Models
AI systems that generate worlds, 3D assets, and enhance virtual environments.
Use case: A developer types "medieval tavern with rustic wooden beams", and the AI generates a fully textured 3D scene.
Example technologies: Large world models, generative scene prediction models etc
Example applications:
Text/image/video-to-3D: Type "tropical beach house" and get a 3D model
Virtual world builders: AI that can generate entire virtual environments and predict the next scene/actions from descriptions
Example companies: Worldlabs, Meta (3DGen), Odyssey, Google DeepMind, Luma AI, RunwayML, Nvidia
Market map of example companies developing / leveraging such technologies below:
Note: This 2x2 framework is one way to structure the landscape, primarily from an algorithmic perspective. This market map is not exhaustive and radically simplifies a complex field. Breakthrough research areas like Neural Radiance Fields (NeRFs), 3D Gaussian Splatting, continuous perception systems, specific model architectures (e.g., diffusion models, transformers adapted for 3D), etc span across these categories and continue to evolve rapidly.
Why now and what makes it more relevant today
The increasing relevance of Spatial AI today is fueled by its convergence with multimodal generative AI. The market map illustrates how generalized foundation models are increasingly addressing tasks traditionally handled by specialized algorithms.
At its core, this integration combines two powerful technological approaches:
Spatial AI focuses on enabling machines to understand and interact with space through capabilities like localization (knowing where you are), reconstruction (building representations of the environment), 3D object recognition, path planning, scene understanding (interpreting the context of a spatial environment). It provides the context of the physical and virtual world.
Multimodal Generative AI centers around AI systems that process and generate content across diverse data types (text, images, audio, video, sensor data). It offers the tools and capabilities to create comprehensive world understanding and interactions within these spatial environments.
This synergy allows us to move beyond mere perception to true spatial reasoning and generation. Through this convergence, Spatial AI is now able to incorporate critical spatial dimensions (geometry, depth, spatial relationships) beyond just traditional flat data processing.
Trained on diverse multimodal datasets, spatial foundation models are paving the way for more versatile and intelligent Spatial AI systems capable of adapting to a wider range of challenges and environments, explaining the dynamic shift we observe in the market. Together, they:
Help robots move like humans
Let you design a room by just describing it
Power smart glasses or autonomous drones
Build digital twins
With spatial foundation models, these applications will become faster to build, cheaper to scale, and more intelligent.
Challenges and Future Outlook
Despite the tremendous progress, several challenges remain:
Computational Requirements: Real-time spatial AI and model perforamce on edge is a beast in itself often requiring significant processing power
Sensor Integration: Fusing data from multiple sensors remains complex
Privacy Concerns: Systems that understand physical spaces raise important privacy questions
Interoperability: Different spatial systems need common standards to work together effectively
Accessibility & Adoption: Making these technologies widely available and usable
The market is nascent, not mature—but the underlying momentum is evident.
Conclusion
Spatial AI allow machines to perceive and interact with spatial environments, driving innovation in robotics, autonomous vehicles, immersive experiences, intelligent environments. By breaking it down into the four simple categories as in the 2x2, we can better understand where innovation is happening and what it means for our future.
Future foundation models will enhance spatial understanding, creating more general-purpose spatial AI systems. As these technologies continue to evolve, they'll fundamentally change how we interact with our world and with each other.
For businesses and technologists looking to stay ahead of the curve, understanding the spatial AI landscape isn't just advantageous—it's essential.
This blog post represents my personal views based on my experience working in AI and computer vision at Niantic Spatial, Apple Intelligence & Samsung Research US. The market map shared is intended to provide a framework to think through a rapidly evolving technological landscape.
PS: No AI was used in refining this post…. Obviously, I'm kidding - I'm born AI-native (: