Demystifying Spatial AI : A Market Map

Apr 01, 2025

I've been fortunate to work on complex technologies spanning computer vision, natural language processing, multimodal AI over the past 3 years as an AIML PM in 100-150 people labs in San Francisco, US. However, as a result of this, I’ve heard spatial AI / spatial computing / multimodal LLMs / embodied AI etc used interchangeably to refer to a wide variety of technologies and sometimes, inaccurately.

True to a PM but somewhere like a VC (:')), this post is my attempt to demystify the complex topic of Spatial AI and establish some ground-truths through a market map.

Spatial Computing vs. Spatial AI

Spatial technologies, at its core, bridge the digital and physical worlds by giving machines the gifts of perception and presence—the ability to understand, interact with, and navigate space. This convergence represents one of the most significant technological shifts we're experiencing today, yet it remains poorly understood by many outside specialized research circles.

Spatial Computing

Spatial Computing is the broader technological framework enabling machines to understand and interact with physical and virtual spaces, including:

Hardware (sensors, displays, processors)
Software systems and environments
Interaction paradigms
Infrastructure bridging digital and physical worlds

Examples: AR/VR headsets, LiDAR sensors, 3D mapping tools, robotic navigation systems

Spatial AI

Spatial AI, a subset of spatial computing, focuses on the AI/ML components that allow machines to perceive, understand, reason, and interact with spatial environments. Spatial AI includes:

Computer vision algorithms that understand scenes
AI models that can reason about spatial relationships
Machine learning systems that can predict spatial behaviors
Foundation models trained to understand physical spaces
Neural networks that generate 3D environments

Simply put: Spatial Computing provides the technological foundation while Spatial AI provides the intelligence layer. For example, in case of self-driving cars:

Spatial Computing for AVs encompasses the entire technological stack: the sensors that perceive the physical world, the processing systems that handle spatial data, and the interface systems that translate decisions into actions, providing the hardware and software framework to operate
Spatial AI is specifically the "brain" of self-driving cars that processes this spatial data to create understanding of the physical environment, reason about spatial relationships, and make navigation decisions in real-world contexts, providing the intelligence needed by the vehicle to be truly useful

Spatial AI Market Map

While the distinction between Spatial Computing and Spatial AI provide a foundational understanding, Spatial AI encompasses a diverse range of technologies.

To simplify this, I’ve created a 2x2 market map with the following axes:

Axis 1: Real World vs. Virtual World

This axis differentiates where spatial technologies operate and derive data:

Real World: Technologies that work with existing physical spaces and their digital representations i.e digital twins.
Virtual World: Technologies that create and navigate synthetic digital environments.

In essence, the distinction lies in the origin: Real World technologies engage with actual physical reality (either directly or through its digital representation), while Virtual World technologies operate in computer-generated environments that do not necessarily correspond to real-world locations.

Axis 2: Spatial Intelligence vs. Foundation Models

This axis distinguishes how AI approaches spatial problems:

Spatial Intelligence: Specialized computer vision, AIML algorithms optimized for specific spatial tasks, such as perception, reconstruction, localization etc
Foundation Models: Large-scale, multimodal AI systems capable of generalized spatial reasoning and generation.

This distinction separates task-specific AI (Spatial Intelligence) from generalized AI (Foundation Models) with broader spatial capabilities.

This framework yields 4 categories of Spatial AI:

1. Real World Spatial Intelligence

Technologies that perceive, understand, and navigate the physical world using specialized algorithms

Use Case: When a delivery robot navigates sidewalks to avoid obstacles while delivering food, it relies on real-world spatial intelligence.

Example technologies: SLAM (Simultaneous Localization and Mapping), LiDAR-based mapping, GIS, computer vision, digital twins.

Example companies & applications:

Autonomous Vehicles (Waymo): Cars that can "see" roads, detect obstacles, and navigate to destinations
3D scanning & mapping apps (Matterport, Polycam, Scaniverse): Apps that scan and create 3D models of your home
Robot warehouse workers (Amazon Robotics): Robots that navigate warehouse floors to pick and sort items

2. Virtual World Spatial Intelligence

Technologies that create, simulate, and interact within virtual environments.

Use Case: When you use a VR headset to walk through a virtual house model and rearrange furniture collaboratively, you're engaging with virtual world spatial intelligence.

Example technologies: 3D game engines (Unity, Unreal), VR spatial mapping, metaverse platforms, and AR/VR collaboration tools.

Example companies & applications:

Video game engines (Unity, Unreal Engine) : Tools that create 3D spaces for games
Virtual collaboration rooms (Meta Horizon) : Digital spaces where people can meet and work together

3. Real World Foundation Models

AI systems that understand and reason about physical spaces

Use case: AI analyzing a photo of your living room and suggesting furniture layouts.

Example technologies: Multimodal AI (vision-language models, embodied AI), Geospatial AI (Google Earth, Niantic LGM) etc

Example companies & applications:

Humanoid robots (Figure AI): Robots that can understand verbal instructions about physical tasks
AI perception models (Anthropic, OpenAI, Gemini): Multimodal models integrating text, images, and spatial data for reasoning
Geospatial AI (Niantic LGM, Gemini with Google Earth, Atlas AI): Models that understand satellite imagery and create intelligent extrapolations of real locations, describe what's happening

4. Virtual World Foundation Models

AI systems that generate worlds, 3D assets, and enhance virtual environments.

Use case: A developer types "medieval tavern with rustic wooden beams", and the AI generates a fully textured 3D scene.

Example technologies: Large world models, generative scene prediction models etc

Example applications:

Text/image/video-to-3D: Type "tropical beach house" and get a 3D model
Virtual world builders: AI that can generate entire virtual environments and predict the next scene/actions from descriptions

Example companies: Worldlabs, Meta (3DGen), Odyssey, Google DeepMind, Luma AI, RunwayML, Nvidia

Market map of example companies developing / leveraging such technologies below:

Note: This 2x2 framework is one way to structure the landscape, primarily from an algorithmic perspective. This market map is not exhaustive and radically simplifies a complex field. Breakthrough research areas like Neural Radiance Fields (NeRFs), 3D Gaussian Splatting, continuous perception systems, specific model architectures (e.g., diffusion models, transformers adapted for 3D), etc span across these categories and continue to evolve rapidly.

Why now and what makes it more relevant today

The increasing relevance of Spatial AI today is fueled by its convergence with multimodal generative AI. This is evident in the market map, which illustrates how foundation models are increasingly addressing tasks traditionally handled by specialized algorithms. At its core, this integration combines two powerful technological approaches:

Spatial AI focuses on enabling machines to understand and interact with space through capabilities like reconstruction (building 3D representations of the environment), localization (knowing where you are in 3D), object recognition, path planning, scene understanding (interpreting the context of a spatial environment). It provides the context of the real and virtual world.
Multimodal Generative AI centers around AI systems that process and generate content across diverse data types (text, images, audio, video, sensor data). It offers the tools and capabilities to create comprehensive world understanding and interactions within these spatial environments.

This synergy allows us to move beyond mere perception to true spatial reasoning and generation. Through this convergence, Spatial AI is now able to incorporate critical spatial dimensions (geometry, depth, spatial relationships) beyond just traditional flat data processing.

Trained on diverse multimodal datasets, spatial foundation models enable more versatile and intelligent Spatial AI systems that are capable of adapting to a wider range of challenges and environments. This is leading to the dynamic shift we observe in the market. Together, they:

Help robots move like humans
Let you design a room by just describing it
Power smart glasses or autonomous drones
Build digital twins

With spatial foundation models, these applications will become faster to build, cheaper to scale, and more intelligent.

Challenges and Future Outlook

Despite the tremendous progress, several challenges remain:

Computational Requirements: Real-time spatial AI and model perforamce on edge is a beast in itself often requiring significant processing power
Sensor Integration: Fusing data from multiple sensors remains complex
Privacy Concerns: Systems that understand physical spaces raise important privacy questions
Interoperability: Different spatial systems need common standards to work together effectively
Accessibility & Adoption: Making these technologies widely available and usable

The market is nascent, not mature—but the underlying momentum is evident.

Conclusion

Spatial AI allow machines to perceive and interact with spatial environments, driving innovation in robotics, autonomous vehicles, immersive experiences, intelligent environments. By breaking it down into the four simple categories as in the 2x2, we can better understand where innovation is happening and what it means for our future.

Future foundation models will enhance spatial understanding, creating more general-purpose spatial AI systems. As these technologies continue to evolve, they'll fundamentally change how we interact with our world and with each other.

For businesses and technologists looking to stay ahead of the curve, understanding the spatial AI landscape isn't just advantageous—it's essential.

PS: No AI was used in refining this post…. Obviously, I'm kidding - I'm born AI-native (:

This blog post represents my personal views based on my experience working in AI and computer vision at Niantic Spatial, Apple & Samsung Research US. The market map shared is intended to provide a framework to think through a rapidly evolving technological landscape.

jp-writes

Discussion about this post