GLOSSARY

Image-to-3D

Image-to-3D is AI that takes one or a few images of an object and reconstructs a 3D mesh. Unlike photogrammetry, it does not need many overlapping views — modern models hallucinate the unseen back side.

Definition

Modern image-to-3D pipelines follow a two-stage pattern: a multi-view diffusion model generates synthetic views of the object from angles the user did not provide, then a mesh reconstruction model fuses the real and synthesized views into a consistent 3D shape. Trellis, Hunyuan3D, Rodin, and Stable Zero123 all follow variations of this approach.

Single-image input is the hard case — the back side is fully imagined by the model, with all the inconsistency that implies. Multi-image input (front, back, side) anchors the model and dramatically improves fidelity. Automatic3D's pipeline generates a multi-view collage internally to get those anchors.

Why it matters

Image-to-3D is currently the highest-quality way to get AI-generated 3D output. Text-to-3D pipelines that produce good results almost always go through an intermediate image step. For users with a reference photo, image-to-3D is a direct path to a digital twin.

Common confusion

Image-to-3D is not photogrammetry. Photogrammetry needs dozens of overlapping photos and reconstructs only what was actually captured. Image-to-3D works from one image and fills in the rest with prior knowledge — which means the result resembles the input but is not a metric reconstruction.

For real-world digital twins where geometric accuracy matters, photogrammetry or LiDAR scanning is the right tool. For quickly turning a sketch or reference image into a printable model, image-to-3D wins.

SEE ALSO