31 October 2D to 3D Benchmark Metrics

On 24 October 2D inputs went live on Subnet 17 mainnet enabling direct like-for-like comparison with existing generative 3D foundational models. This is a snapshot of the results after 1 week.

This benchmark evaluates leading generative 3D foundational models. It is inspired by 3D Arena and uses Visual Language Model judges as a non-human evaluation criteria.

Read more about evaluation methodology here.

Motivation

Evaluating generated 3D quality quantitatively is challenging, subjective, and there is no standard practice for evaluating aesthetics in real-world applications.

This head-to-head competition utilizes the strength of reasoning models in like-for-like comparisons and displays final results side-by-side with the ability to download files for further human evaluation.

Criteria

Models must handle image (.png, .jpg) inputs and produce mesh (.obj, .glb) or splat (.splat, .ply) outputs. They should run end-to-end without human intervention, including UV unwrapping, texture mapping, and other post-processing.

Contributing

All inputs and outputs are publicly available here. Input image URLs are provided here.

Results

404 v. CSM Cube 404 wins: 43 CSM Cube wins: 16 Draw: 42 Results & Data

404 v. Trellis 404 wins: 39 Trellis wins: 17 Draw: 45 Results & Data

404 v. Hunyuan 2.1 404 wins: 70 Hunyuan wins: 15 Draw: 16 Results & Data

404 v. Meshy 404 wins: 97 Meshy wins: 2 Draw: 1 Results & Data

Future Benchmarks

This set of generations (1 week after launching on mainnet) has been submitted to 3D Arena for human-in-the-loop ranking on their leaderboard.

This benchmark process will be updated with additional closed source models on an ongoing basis (as new models are released).

Last updated