1Australian National University, 2University of Oxford, 3Beijing Academy of Artificial Intelligence, 4Cybever
*Equal contribution

3D-GPT employs LLMs as a multi-agent system with three collaborative agents for procedural 3D generation.


The significance of 3D asset modeling is undeniable in the metaverse era. Traditional methods for 3D modeling of realistic synthetic scenes involve the painstaking tasks of complex design, refinement, and client communication.

To reduce workload, we introduce 3D-GPT, a framework utilizing large language models (LLMs) for instruction-driven 3D modeling. In this context, 3D-GPT empowers LLMs as adept problem-solvers, breaking down the 3D modeling task into manageable segments and determining the appropriate agent for each.

3D-GPT comprises three pivotal agents: task dispatch agent, conceptualization agent, and modeling agent. Together, they collaboratively pursue two essential goals. First, it systematically enhances concise initial scene descriptions, evolving them into intricate forms while dynamically adapting the text based on subsequent instructions. Second, it seamlessly integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation.

We show that 3D-GPT provides trustworthy results and collaborate effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work underscores the vast potential of LLMs in 3D modeling, laying the groundwork for future advancements in scene generation and animation.

3D Scene Generation