Spaitial Logo
Back to Blog

Announcing Our Frontier Model for 3D World Generation

Echo turns a simple text prompt or image into a 3D-consistent world you can explore in real time.

Today, we are pleased to announce Echo, a frontier model for 3D world generation. Our model takes as input text or an image, and generates a 3D consistent world.

Interactive World Exploration

Echo's core strength lies in its ability to take a simple text prompt or an image as input and, from that, generate a rich, 3D-consistent world. This makes it a powerful tool not only for creative 3D scene generation but also for digital twinning pipelines and 3D design workflows. Imagine turning a picture of a factory floor into a structured 3D representation, or describing "a flexible exhibition space with sculptural lighting" and seeing the concept materialize as an explorable environment. Echo bridges the gap between ideas, reality, and interactive spatial design.

Echo empowers users with real-time interactivity. Once a 3D representation is generated, users gain complete control over the camera. This means they can freely explore the newly created world from any angle, rendering novel views instantly. What's truly remarkable is that this real-time rendering capability is accessible even on low-end hardware, democratizing access to high-quality 3D experiences. This breakthrough eliminates the need for expensive, specialized equipment, making 3D world creation and exploration accessible to a much broader audience, from professional designers to casual enthusiasts.

Explore Generated Scenes

Below are multiple generated scenes that you can freely explore. Navigate through each world and experience the quality of 3D generation firsthand.

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

How Does It Work?

Echo follows a simple but powerful pipeline. It begins with an input image—or, when starting from text, a generated reference image. From this input, the model predicts a physically grounded 3D representation of the scene, capturing both geometry and appearance in a consistent spatial layout. This internal representation is then converted into a renderable format suitable for real-time exploration. For the web demo, we use 3D Gaussian Splatting (3DGS), which provides extremely fast, GPU-friendly rendering and makes interactive viewing possible directly in the browser, even on modest hardware.

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

Geometry-Grounded Scene Representation

Echo generates a geometrically-grounded scene representation. Rather than producing a set of disconnected views, the model infers a globally-consistent 3D structure that encodes both geometry and appearance at metric scale. This means that every rendered view—along with its depth map—comes from a unified spatial understanding of the scene, not from independent image hallucinations. The current implementation uses 3D Gaussian Splatting (3DGS) as the rendering primitive due to its exceptional efficiency, but the representation itself is flexible and can be converted into other formats as rendering technologies evolve.

We believe this geometrically faithful structure is essential for many real-world applications such as digital twinning or content creation. For example, a single photograph of a construction site or manufacturing cell can be turned into a metrically coherent 3D reconstruction, enabling teams to plan maintenance, simulate layout changes, or assess spatial constraints without rescanning the environment.

For content creation, consistent 3D worlds unlock faster iteration in graphics applications such as gaming (e.g., generating large game level environments), and even allows for more realistic training data for robotics scenarios (e.g., building geometrically accurate simulation environments for navigation/manipulation). As Echo maintains global consistency by design, these workflows remain reliable even when starting from extremely sparse input captured from commodity devices, or when the generation is conditioned merely on a text prompt.

Spatial Understanding and Editing

Echo introduces a set of capabilities that make 3D world generation immediately useful across creative, design, and professional workflows.

Scene Restyling

Our built-in scene restyling mechanism allows users to restyle generated or reconstructed environments. This supports use cases from interior redesign, real-estate visualization, to game level restyling. Users can quickly explore variations such as "minimalist Scandinavian," "industrial loft," or "warm, rustic wood tones," without manually rebuilding the scene.

Experience different stylistic variations of the same scene in 3D.

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

Spatial Understanding

"Echo" can also issue more localized scene edits. As such, it is useful to first decompose the scene into its parts to facilitate their editing. Here, Echo generates a semantic segmentation mask of the scene, identifying components such as "chair", "table", "floor", or "wall".

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

Scene Editing

Echo can leverage a textual prompt, or the aforementioned scene-object decomposition, to remove, add, or replace objects in the scene. Below are examples of scenes with such edits applied. Explore these edited worlds interactively:

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

Object Removal

Pushing editing further, Echo can remove selected objects from a cluttered scene to reveal a clean, unobstructed view of the underlying space.

Interactive viewer loading. If nothing appears, please disable script blockers and refresh the page.

Future Plans

Echo is only the first step in a much larger roadmap. We are building toward a future in which anyone can generate, modify, and reason about complex 3D environments with natural language. Upcoming versions will expand editing capabilities to support full prompt-based scene manipulation—allowing users to add, remove, rearrange, or restyle objects simply by describing their intent.

Looking further ahead, our model will feature dynamics and physical reasoning on top of the underlying representation. This will enable scenes that not only look consistent but also feature physics-based behavior, opening the door to interactive simulations, robotics testing, and richer digital twin applications. Echo establishes the foundation; the next iterations will bring more powerful World Models that evolve, respond, and interact just like their real-world counterparts.

We invite you to explore the generated scenes directly on our website, where you can navigate examples and experiment with the model's capabilities in real time. For those interested in trying Echo on their own data or workflows, we are launching a closed beta, sign up to get early access and help shape the next generation of 3D world creation tools.