3D Gaussian Splatting (3DGS) has demonstrated superior quality in modeling 3D objects and scenes. However, generating 3DGS remains challenging due to their discrete, unstructured,
and permutation-invariant nature. In this work, we present a simple yet effective method to overcome these challenges. We utilize spherical mapping to transform 3DGS into a
structured 2D representation, termed UVGS. UVGS can be viewed as multi-channel images, with feature dimensions as a concatenation of Gaussian attributes such as position,
scale, color, opacity, and rotation. We further find that these heterogeneous features can be compressed into a lower-dimensional (e.g., 3-channel) shared feature space using
a carefully designed multi-branch network. The compressed UVGS can be treated as typical RGB images. Remarkably, we discover that typical VAEs trained with latent diffusion
models can directly generalize to this new representation without additional training.
Our novel representation makes it effortless to leverage foundational 2D models, such as diffusion models, to directly model 3DGS.
Additionally, one can simply increase the 2D UV resolution to accommodate more Gaussians, making UVGS a scalable solution compared to typical 3D backbones.
This approach immediately unlocks various novel generation applications of 3DGS by inherently utilizing the already developed superior 2D generation capabilities.
In our experiments, we demonstrate various unconditional, conditional generation, and inpainting applications of 3DGS based on diffusion models, which were previously non-trivial.
UVGS and Super UVGS can be used to compress 3DGS assets using pretrained image Autoencoders by upto 99.5%.
The following figure shows a wide variety of high-quality unconditional generation result from our method. We train a diffusion model to sample Super UVGS images from random noise. The Super UVGS can be converted to 3DGS object using inverse mapping network and inverse spherical projection.
Following are the conditional generation result from our method. We train a text-conditioned diffusion model to sample Super UVGS images. The Super UVGS can be converted to 3DGS object using inverse mapping network and inverse spherical projection.
Comparison of unconditional 3D asset generation on the cars category with SOTA methods. Figure shows that DiffTF produces low-quality, low-resolution cars lacking detail. While Get3D and GaussianCube achieve higher resolution, they suffer from 3D inconsistency, numerous artifacts, and lack richness in 3D detail. In contrast, our method generates high-quality, high-resolution objects that are 3D consistent with sharp, well-defined edges.
We also compare the performance of our model against various SOTA methods for text-conditional object synthesis. Our method not only generated high-quality 3D assets for simpler objects, but also for complicated objects with intricate geometry.
@article{rai2025uvgs,
title={UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping},
author={Aashish Rai and Dilin Wang and Mihir Jain and Nikolaos Sarafianos and Arthur Chen and Srinath Sridhar and Aayush Prakash},
journal={arXiv preprint},
year={2025}
}