GLOSSARY

3D Diffusion Models

3D diffusion models apply the same denoising framework that powers Stable Diffusion to 3D representations — multi-view images, triplanes, voxels, or point clouds — that get converted to a mesh as the final step.

Definition

Image diffusion models learn to reverse a gradual noising process on 2D images. 3D diffusion adapts the idea to 3D representations. Common variants:

  • Multi-view diffusion: denoise several view images jointly to keep them consistent (Zero123++, MVDream)
  • Triplane diffusion: denoise three orthogonal feature planes that are decoded into geometry (Shap-E, GET3D)
  • Point cloud diffusion: denoise positions of a fixed-size set of points (Point-E)
  • Latent 3D diffusion: denoise in a learned compressed 3D latent space (Trellis, Hunyuan3D-2)

Why it matters

Most current production-quality text-to-3D and image-to-3D systems are diffusion-based. Diffusion models scale, generalize, and condition on text or images cleanly — properties that older score-distillation methods lacked. The leading open models in 2025–2026 (Hunyuan3D, Trellis, Stable Zero123) all use diffusion in some form.

Common confusion

A 3D diffusion model rarely outputs a finished mesh directly. The diffusion stage produces an intermediate representation — views, points, voxels — and a separate stage extracts a mesh. Quality and printability depend heavily on this second stage.

SEE ALSO