Content area
With the rapid development of computer graphics and 3D computer vision, the field of reconstructing 3D geometric shapes from sketches has quickly seen an influx of innovative methodologies. However, the majority of current methods primarily rely on the associative memory of matching images to models during their training stages, failing to fully grasp the real structure of the three-dimensional objects presented in the images. These approaches often lead to significant discrepancies between the reconstructed objects and the expected models. For instance, some methods often lose important geometric details when dealing with objects that have complex topological structures, resulting in a significant reduction in the visual consistency between the reconstructed models and the original images. Moreover, existing technologies also demonstrate clear limitations in theoretically capturing the subtle symmetry and local features of three-dimensional objects. To address this issue, we propose a two-phase framework in this work to more accurately reflect the objects in the input images in the reconstructed models. Initially, we employ an encoder–decoder structure to generate an implicit signed distance field (SDF) representing the 3D shapes. Subsequently, we carry out a comprehensive optimization of the decoder from the first phase. This process includes two main steps: firstly, utilizing differentiable rendering techniques to render the mesh model derived from the distance field, ensuring its consistency with the input images; secondly, combining the symmetry of the 3D shapes with innovative regularization loss to further refine the decoder, aiming to reduce the discrepancies between the images and the 3D shapes. Compared to similar research, our method not only demonstrates superior performance in reconstructing 3D shapes from sketches but also offers new perspectives and solutions for the optimization of 3D shapes. This work signifies an important advancement in understanding and reconstructing the transition process from sketches to precise 3D models. Code:
Introduction
In the fields of computer vision and computer graphics, single-view 3D reconstruction technology inferring an object’s 3D structure from a two-dimensional image has become a focal point of research, especially in rapidly evolving application areas such as medical diagnostics [1], virtual reality (VR) [2, 3], and augmented reality (AR) [4, 5]. The key to this technology lies in its ability to meet the growing demand for 3D content, particularly against the backdrop of rapid advancements in AR/VR technologies and portable display devices. Among various input media, hand-drawn sketches have garnered special attention from researchers due to their intuitiveness and ease of acquisition [6]. Although sketches often lack detailed information or may be overly simplified, they still effectively convey the core structural information of an object, providing a direct pathway from two dimensions to three. However, accurately reconstructing rich 3D details from these abstract lines remains a challenging task. It requires algorithms not only to understand the lines and shapes within the sketch but also to infer those non-intuitive geometric details. Existing 3D reconstruction technologies tend to focus on recovering object shapes from high-resolution pictures or detailed wireframes [7, 8], which are less effective with informal and incomplete inputs like sketches. Despite breakthroughs brought about by the introduction of deep learning technology, these methods typically rely on vast amounts of training data and complex network structures, facing limitations in handling the abstractness and sparsity of hand-drawn sketches.
To address these challenges, we propose an innovative optimization strategy that refines existing deep encoder–decoder networks to enhance the reconstruction effect from sketches to 3D models. In the feature extraction phase, inspired by advanced high-quality feature engineering techniques [9], we have deeply optimized the method for extracting features from sparse data, aiming to more effectively capture and utilize the key information contained within sketches. In the model refinement phase, unlike previous studies that introduced external refinement networks, our approach directly enhances the performance of the decoder network, thereby improving the precision and detail of the reconstructed models without increasing the complexity of the model. In the first phase, our deep encoder–decoder network extracts shape encodings from a single sketch, generating an initial 3D shape. This phase focuses on capturing the basic geometric shapes and topological structure within the sketch, providing a solid foundation for the subsequent refinement process. In the refinement phase, we employ a series of advanced optimization algorithms specifically designed to improve the geometric consistency and visual realism of the preliminary 3D models. By integrating symmetry constraints and depth detail enhancement mechanisms within the decoder network, we significantly enhance the quality of the reconstructed models, reducing reliance on large volumes of training data. This optimization strategy not only optimizes the overall structure of the model but also allows for targeted adjustments of detail features within the model, ensuring that the final reconstructed 3D model visually aligns more closely with the original sketch and possesses a higher degree of realism. We conducted comprehensive evaluations on multiple datasets, including the ShapeNetCore.v1 [10] synthetic sketch dataset and the real sketch dataset ShapeNet-Sketch [6]. The experimental results demonstrate that our method outperforms existing technologies in sketch-based 3D reconstruction, particularly in addressing the abstractness and sparsity of sketches. Moreover, our optimization strategy enables the model to better capture the geometric information implied within the sketch, generating more refined and accurate 3D models. In summary, our main contributions include:
We have proposed a method that directly optimizes deep encoder–decoder networks to enhance the reconstruction effect from sketches to 3D models.
We have integrated symmetry constraints and depth detail enhancement mechanisms for refined processing of the reconstructed objects.
We conducted comprehensive evaluations across multiple datasets, proving the effectiveness and superiority of our method.
Related work
Sketch-based 3D modeling
Over the years, the field of sketch-based 3D modeling has been a focal point for researchers, with a multitude of methods being proposed. Early efforts primarily revolved around using traditional media, such as paper, for drawing two-dimensional sketches and subsequently converting these sketches into 3D models. These initial attempts laid the groundwork for sketch-based 3D modeling techniques, with comprehensive reviews by Bonnici and Olsen providing valuable references for subsequent research [11, 12]. Plumed et al. focused on utilizing polyhedral shapes in 2D engineering sketches to reconstruct Constructive Solid Geometry (CSG) models by extracting datums, showcasing a novel perspective on CSG model construction [13]. Tanaka et al. developed a method capable of automatically converting sketches of mechanical objects into 3D models, offering an effective solution for the field of mechanical design [14]. Camba et al. provided a comprehensive analysis on the current state and future opportunities of sketch-based modeling (SBM) in mechanical engineering design, highlighting the challenges that current SBM tools have not yet offered additional value compared to traditional media, while exploring the potential for improving SBM tools under new technological trends [15]. As technology evolved, methods for 3D modeling based on two-dimensional sketches diversified into interactive and end-to-end approaches. Interactive methods emphasize user participation, requiring strategic knowledge to decompose complex tasks into multiple sequential steps or specific drawing gestures and annotations [16, 17, 18, 19, 20, 21–22]. Although these methods offer greater flexibility and precision, they may present a steep learning curve for inexperienced users. Conversely, end-to-end methods, particularly those utilizing template primitives or retrieval-based techniques [23, 24, 25, 26, 27, 28–29], offer a more direct, user-friendly modeling approach. These methods allow for the direct creation of 3D models from sketches, bypassing tedious intermediate steps and enabling a rapid transformation from concept to model. However, this convenience often comes at the expense of customizability, limiting users’ ability to create unique and personalized 3D models. In recent years, the rapid development of deep learning technology has achieved significant accomplishments in fields such as medical image segmentation [30], image synthesis [31], and image stylization [32], bringing new research momentum to the area of sketch-based 3D modeling. Particularly in the domain of single-view 3D reconstruction [28, 33, 34, 35–36], the method of directly reconstructing 3D models from sketches using deep neural networks has gradually become a new research trend. These advanced methods, by learning the complex relationships between sketches and 3D shapes from large datasets, are capable of transforming highly abstract sketches into precise 3D models. However, this transformation process faces numerous challenges, mainly due to the inherent sparsity and abstractness of sketches. Sketches often lack the detailed boundary and texture information required for accurate depth estimation, which limits the precision of the reconstructed 3D models. To address the challenges of generating 3D models from sketches, our study introduces an innovative method that leverages the symmetry in models, further enhancing model accuracy and realism through quantitative analysis of symmetry strength. Additionally, we employ a series of unsupervised learning techniques for a more precise representation of implicit functions, effectively resolving the uncertainties in the reconstruction process. Our focus is not only on accurately capturing 3D information but also on enriching the model’s details and enhancing its realism. Through this unique methodology, our research not only provides an efficient solution for the transformation from two-dimensional sketches to 3D models but also establishes new technical standards in the field of sketch-based 3D modeling, significantly improving the precision and efficiency in handling abstract inputs.
Single-view 3D reconstruction
In the fields of computer vision and computer graphics, single-view 3D reconstruction has long been a highly challenging problem. The emergence of large-scale datasets like ShapeNet has made data-driven approaches increasingly popular in recent years. These methods leverage vast amounts of data to train models for inferring 3D structures from single-view images. Notably, some studies utilize category-level information to derive 3D representations from a single image [16, 37], while others directly generate 3D models from 2D images, with differentiable rendering techniques playing a crucial role in achieving this goal [17, 18–19]. Additionally, recent advancements have introduced unsupervised techniques for implicit function representation using differentiable rendering [20, 38], further expanding the scope of single-view 3D reconstruction methods. However, most existing methods focus on learning 3D geometries from color 2D images [37, 39, 40–41], whereas our research aims to generate 3D meshes from two-dimensional sketches, representing a more abstract and sparse form of image representation. Unlike color images, sketches lack important information such as texture, lighting, and shading, which are crucial for accurately inferring 3D geometries. Furthermore, sketches are often incomplete, and a single set of lines can be interpreted differently in 3D space, introducing additional ambiguity to the problem. In light of the difficulties in generating high-quality 3D shapes, we propose an innovative solution aimed at reconstructing high-quality three-dimensional forms from vague and sparse sketches. This method integrates symmetry exploration with metric-based symmetry strength evaluation, complemented by advanced unsupervised learning algorithms for more accurate mapping of implicit functions. Our focus extends beyond merely extracting accurate 3D information from sketches; we aim to enhance the richness of details and overall realism, thus elevating the quality of the reconstructed models. This strategy ensures a smooth transformation from sketches to 3D meshes and sets new industry standards in dealing with ambiguous and sparse data, surpassing the limitations of traditional methods. Our success highlights the immense potential of data-driven technology in addressing the complex task of converting abstract sketches into precise 3D shapes (Fig. 2).
Fig. 1 [Images not available. See PDF.]
Overall structure of the network. This architecture employs a two-stage generative refinement process. The first stage, comprised of an encoder-decoder setup, aims to generate an initial mesh from hand-drawn sketches. The second stage introduces contour loss and symmetry loss, utilizing the symmetry strength measure, denoted as ’score’ (The red part indicates that the symmetry strength fraction is very small), to enhance the generation quality of asymmetrical objects and achieve detailed mesh optimization, catering to precise manufacturing and design requirements
Proposed method
The methodology for single-view reconstruction from sketches leverages the use of an object’s single hand-drawn sketch to reconstruct the 3D shape of the object. This process involves mapping a hand-drawn sketch of width and height , denoted as , to an implicit signed distance function (SDF) through an encoder–decoder network. The SDF represents a fully differentiable surface mesh parametrization capable of depicting any topology.
Fig. 2 [Images not available. See PDF.]
Calculate the residual network diagram of symmetric strength
Framework overview
Our approach, as Fig. 1 illustrates, unfolds in two phases: model generation and model optimization. Initially, the encoder network maps the input sketch into the shape space to generate a shape encoding, which is then combined with 3D point coordinates to decode the corresponding SDF values of 3D points.
The signed distance function (SDF) is a pivotal concept, serving as a continuous function that assigns a value to every point in 3D space, indicative of the distance to the nearest surface. This distance value is signed, indicating whether the point lies outside (positive value) or inside (negative value) the object, with the formula
1
where represents a point in 3D space, and denotes the distance value.The underlying surface is implicitly represented by the isosurface of the SDF equal to zero. We can employ ray casting or mesh generation via the Marching Cubes algorithm for rasterization, thus visualizing the anticipated 3D model and laying the groundwork for subsequent optimization steps. By comparing the rendered silhouette images with the original sketch, we adjust the network to ensure higher accuracy and consistency. Moreover, we meticulously consider natural properties of 3D models, such as symmetry. By constraining the model’s symmetry, we achieve reconstructions more aligned with the input sketch. Additionally, we introduce a key parameter, the symmetry strength measure, enabling high-quality reconstruction of less symmetric objects.
Stage 1: encode and decode
Our proposed infrastructure is comprised of an encoder–decoder framework. During training, the encoder transforms the hand-drawn sketch into latent shape encodings, capturing the sketch’s fundamental features and essence, thus providing a high-level abstract description of the 3D shape. Subsequently, the decoder combines the latent shape encoding with points in 3D space to predict the SDF values, collectively forming an SDF based on the decoder, thereby achieving precise 3D reconstruction.
The encoder plays a crucial role in this process. Hence, we have carefully selected the encoder to ensure it captures detailed semantic information during the training phase. By analyzing hand-drawn sketches, the encoder extracts key visual features and transforms them into a format suitable for further processing by the decoder, laying a solid foundation for subsequent 3D reconstruction tasks and enabling the decoder to accurately reproduce the 3D shape depicted by the sketch. The encoder’s formula is as follows:
2
where represents the input hand-drawn sketch, and is the encoder function performing feature extraction and transformation operations, with being the shape encoding extracted from sketch .To derive the SDF based on the decoder’s shape encoding, we constructed a dataset containing shapes, each represented by an SDF
3
We prepared a set of samples consisting of point samples and their corresponding SDF values:4
where each sample point includes a point in 3D space and its corresponding SDF value , indicating the distance of point to the nearest surface, with ’s sign indicating whether point is outside or inside the object.For the decoder, since features are extracted from the input image using the encoder, each latent encoding is paired with a training shape . For a given hand-drawn sketch, we prepared a set of paired data , including 3D point samples and their SDF values. In the latent shape encoding space, the posterior probability of shape encoding given shape SDF samples can be decomposed as:
5
where represents the probability of observing a specific SDF value given latent encoding and point .Subsequently, latent encodings and 3D point samples are inputted into the decoder, which predicts the SDF values for corresponding 3D points. By calculating the loss between predicted and true SDF values, we optimize the encoder–decoder network:
6
where represents the set of sampled 3D point samples, denotes the latent vector outputted by the encoder, and signifies the true SDF value. This loss aims to minimize the discrepancy between the mesh’s predicted SDF values and the true values, thereby enhancing the reconstruction’s accuracy.Upon completing the training phase and fixing the network parameters , our testing process commences with an input hand-drawn sketch. The sketch is initially processed by the encoder, transformed into shape encoding , which captures the core features of the 3D shape represented by the sketch. Subsequently, combining this shape encoding , our method predicts the SDF values for each point within the designated space. This process involves a point-by-point analysis throughout the entire space, thereby collectively constructing a comprehensive SDF representation of the 3D model depicted by the sketch.
Table 1. Qualitative evaluation on the ShapeNet-Synthetic dataset (Chamfer distance )
Airplane | Bench | Cabinet | Car | Chair | Display | Lamp | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 20.7439 | 22.7456 | 24.4748 | 11.8480 | 39.6958 | 22.8501 | 46.2840 |
Deep3DSketch | 20.3943 | 27.0209 | 26.0595 | 15.9253 | 37.2563 | 20.0176 | 55.0070 |
Ours | 16.6097 | 19.2898 | 21.2740 | 4.5490 | 9.9144 | 11.0213 | 38.6365 |
Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 29.2831 | 6.0627 | 42.1575 | 50.1852 | 10.4159 | 26.2253 | 27.1517 |
Deep3DSketch | 32.6578 | 6.1660 | 47.3105 | 58.1760 | 10.1290 | 26.3143 | 29.4180 |
Ours | 22.2488 | 4.9326 | 11.2237 | 12.9489 | 10.8285 | 20.7988 | 15.7135 |
Bold indicates the best result in this category
Table 2. Qualitative evaluation on the ShapeNet-Synthetic dataset (Voxel IoU )
Airplane | Bench | Cabinet | Car | Chair | Display | Lamp | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 0.5158 | 0.5326 | 0.5835 | 0.7152 | 0.5345 | 0.5992 | 0.5244 |
Deep3DSketch | 0.5005 | 0.5157 | 0.5525 | 0.6773 | 0.5249 | 0.6115 | 0.4993 |
Ours | 0.5456 | 0.6476 | 0.6206 | 0.8487 | 0.6946 | 0.6532 | 0.5446 |
Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 0.6508 | 0.7229 | 0.6143 | 0.4606 | 0.8129 | 0.6120 | 0.6061 |
Deep3DSketch | 0.6258 | 0.7163 | 0.6005 | 0.4504 | 0.8126 | 0.5948 | 0.5909 |
Ours | 0.7179 | 0.7392 | 0.7393 | 0.6571 | 0.7783 | 0.6245 | 0.6778 |
Bold indicates the best result in this category
Stage 2: model optimization
Contour optimization
After the network weights have been learned, function can map the sketch’s corresponding latent vector to a signed distance function. Utilizing the Marching Cubes algorithm, a technique commonly employed for extracting isosurfaces, we convert the signed distance function into a set of vertices and faces, constructing a renderable mesh model. Then, based on the input sketch’s camera position and posture, we render this mesh model to produce a contour image of the mesh. By calculating the loss between the generated mesh contour image and the input sketch contour image, we optimize the model’s surface contours. This optimization process is conducted from a selected viewpoint, adjusting the decoder network’s weights through backpropagation. The contour loss function is defined as follows:
7
where is the mesh model generated using the Marching Cubes method, is the method for rendering the contour image, and represents the contour image rendered from the mesh model.Fig. 3 [Images not available. See PDF.]
Qualitative evaluation against Sketch2Model and Deep3DSketch. The visualization of generated 3D models demonstrates that our method is capable of achieving higher fidelity in 3D content
Table 3. Qualitative evaluation on the ShapeNet-Sketch dataset (Chamfer Distance )
Airplane | Bench | Cabinet | Car | Chair | Display | Lamp | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 41.3703 | 56.2164 | 45.3088 | 20.2802 | 61.6435 | 66.7982 | 57.7219 |
Deep3DSketch | 45.8801 | 60.3969 | 41.0214 | 28.9227 | 57.4727 | 89.4539 | 66.4928 |
Ours | 27.6073 | 26.8087 | 39.8520 | 10.5690 | 45.1863 | 46.3550 | 47.5497 |
Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 58.8262 | 16.9269 | 68.0021 | 73.1595 | 37.4594 | 45.7468 | 49.9585 |
Deep3DSketch | 62.0408 | 17.9286 | 77.1665 | 73.9634 | 36.7589 | 53.0370 | 54.6566 |
Ours | 53.0491 | 10.7042 | 28.8466 | 59.6315 | 46.5207 | 25.8674 | 36.0421 |
Bold indicates the best result in this category
Table 4. Qualitative evaluation on the ShapeNet-Sketch dataset (Voxel IoU )
Airplane | Bench | Cabinet | Car | Chair | Display | Lamp | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 0.3683 | 0.3385 | 0.5603 | 0.6174 | 0.3358 | 0.3574 | 0.3951 |
Deep3DSketch | 0.3627 | 0.3374 | 0.5742 | 0.5902 | 0.3493 | 0.3602 | 0.3792 |
Ours | 0.4096 | 0.3914 | 0.5617 | 0.7092 | 0.3434 | 0.3717 | 0.4269 |
Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean | |
|---|---|---|---|---|---|---|---|
Sketch2Model | 0.5603 | 0.5332 | 0.4691 | 0.3369 | 0.5431 | 0.4356 | 0.4501 |
Deep3DSketch | 0.5438 | 0.5231 | 0.4635 | 0.3389 | 0.5517 | 0.4196 | 0.4457 |
Ours | 0.5703 | 0.5896 | 0.5359 | 0.3387 | 0.4444 | 0.4888 | 0.4755 |
Bold indicates the best result in this category
Fig. 4 [Images not available. See PDF.]
Visualization results on ShapeNet-Sketch. Our method’s performance on the ShapeNet-Sketch dataset, in comparison with Sketch2Model and Deep3DSketch, shows that the 3D shapes generated by our approach more closely align with the input sketches
Symmetry loss
Following contour optimization, the generated model’s appearance will, to some extent, align with the contour shape of the input sketch. To prevent overfitting during the optimization process, we incorporate the natural property of symmetry, inherent in many real-world objects, designing a second loss function to constrain the generated model’s shape. Given the characteristic feature of symmetry, where objects exhibit absolute symmetry relative to a certain plane of symmetry, vertices on either side of the symmetry plane invariably appear in pairs. Since our reconstruction method produces models with a unified alignment, rendering the -axis as the vertical plane, we accordingly adopt as the normal vector for this plane of symmetry. The symmetry loss is defined as follows:
8
where consideration is given to the fact that not all objects exhibit perfect symmetry. The introduction of the symmetry strength measure constitutes another key innovation of our method, allowing us to handle various instances of symmetry in a more flexible and precise manner. For each mesh vertex, we assign a symmetry strength measure score, reflecting the intensity of exhibited symmetry. This not only enhances the model’s adaptability but also improves the precision of the optimization process. The symmetry loss, incorporating the symmetry strength measure score, is defined as follows:9
where represents a mesh vertex, and denotes the symmetry strength measure score. The first term calculates the distance between a vertex and its symmetrical counterpart and its nearest neighbor. The second term is introduced to prevent the symmetry strength measure from being too small. When a vertex lacks a significant symmetrical counterpart, its symmetry strength measure approaches zero, and as symmetry intensifies, the symmetry strength measure correspondingly increases. As shown in Fig. 6, areas with stronger symmetry strength have higher scores. This approach not only augments the model’s adaptability in terms of symmetry but also ensures the robustness and flexibility of the optimization process.Table 5. Ablation study
SIL | SYM | Airplane | Bench | Cabinet | Car | Chair | Display | Lamp |
|---|---|---|---|---|---|---|---|---|
16.6097 | 19.2898 | 21.2740 | 4.5490 | 9.9144 | 11.0213 | 38.6365 | ||
19.4129 | 20.0559 | 22.0905 | 4.4110 | 10.7254 | 10.3923 | 20.3744 | ||
15.9670 | 12.4740 | 19.5552 | 4.4271 | 11.0631 | 10.4728 | 17.9679 |
SIL | SYM | Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean |
|---|---|---|---|---|---|---|---|---|
22.2488 | 4.9326 | 11.2237 | 12.9489 | 10.8285 | 20.7988 | 15.7135 | ||
18.5529 | 5.1816 | 8.2173 | 14.3677 | 7.5136 | 12.9523 | 13.4037 | ||
18.7626 | 4.0569 | 7.8844 | 14.7771 | 6.9051 | 12.8144 | 12.8144 |
Bold indicates the best result in this category
Our method achieved better results in most categories on the Chamfer Distance metric compared to the baseline method on the ShapeNet-Synthetic dataset
Table 6. Ablation study
SIL | SYM | Airplane | Bench | Cabinet | Car | Chair | Display | Lamp |
|---|---|---|---|---|---|---|---|---|
0.5456 | 0.6476 | 0.6206 | 0.8487 | 0.6946 | 0.6532 | 0.5446 | ||
0.5500 | 0.6010 | 0.6134 | 0.8383 | 0.6756 | 0.6882 | 0.5695 | ||
0.5590 | 0.6760 | 0.6497 | 0.8524 | 0.6685 | 0.7008 | 0.5909 |
SIL | SYM | Loudspeaker | Rifle | Sofa | Table | Telephone | Watercraft | Mean |
|---|---|---|---|---|---|---|---|---|
0.7179 | 0.7392 | 0.7393 | 0.6571 | 0.7783 | 0.6245 | 0.6778 | ||
0.7148 | 0.7402 | 0.7272 | 0.6421 | 0.8098 | 0.6540 | 0.6788 | ||
0.7283 | 0.7498 | 0.7410 | 0.6392 | 0.8298 | 0.6582 | 0.6957 |
Bold indicates the best result in this category
Our method outperformed the baseline method in most categories on the Voxel IoU metric on the ShapeNet-Synthetic dataset
To prevent the model from falling into local minima during the optimization process and to comprehensively consider the model’s performance and complexity, we employ a composite loss function containing regularization terms, thereby enhancing the final robustness of our network. Our total loss function combines several key components as follows:
10
where represents the contour loss, ensuring the generated model’s contours align with those of the input sketch; denotes the symmetry loss, aimed at preserving the model’s symmetry properties; and is the newly introduced Laplacian loss, designed to smooth the overall optimization process and enhance the model’s overall stability. The design of this composite loss function aims to balance the model’s precision and generalization capability while avoiding overfitting and optimization pitfalls, ensuring a smooth optimization process and high-quality final results.Fig. 5 [Images not available. See PDF.]
Ablation study. Compared to the baseline method, our approach is capable of generating 3D shapes with enhanced details
Experimental results
This study conducted extensive experiments on two major datasets: one is the synthesized sketch dataset from ShapeNetCore.v1, and the other is the ShapeNet-Sketch dataset. These experiments were aimed at comprehensively evaluating the performance of our sketch-based 3D reconstruction method and conducting a comparative analysis with existing techniques, such as Sketch2Model and Deep3DSketch.
Datasets and experiments
Fig. 6 [Images not available. See PDF.]
From left to right: the sketch, the original mesh, the refined mesh, and the vertex symmetry strength scores (represented as points or colors on the refined mesh). Green indicates higher symmetry strength
ShapeNetCore.v1 Syntheticed Dataset: We selected 13 categories most used in the domain of sketch reconstruction for 3D models from the ShapeNetCore.v1 dataset. This included a total of 34,749 model files, each accompanied by 36 rendered images from different viewpoints, with an image size of 224x224. The synthesized sketch dataset from ShapeNetCore.v1 was created by applying the canny edge detection algorithm to the original images of ShapeNetCore.v1, aiming to mimic the style and features of hand-drawn sketches. Moreover, we sampled the signed distance function (SDF) values from the model files in ShapeNetCore.v1 to provide complete training data.
ShapeNet-Sketch: For quantitative assessment of our method on free-hand sketches and to further facilitate research, we utilized the ShapeNet-Sketch dataset. Initially compiled by Kar et al. [42], this dataset serves to evaluate 3D reconstruction methods based on hand-drawn sketches. It comprises 1300 free-hand sketches and their corresponding ground-truth 3D models across 13 categories from ShapeNet, with 100 rendered images randomly selected for each category. Individuals with varying drawing skills sketched the objects on a touchpad based on the rendered images, thereby creating the dataset. This dataset is particularly suitable for assessing the model’s capability to handle natural hand-drawn sketches.
Experiment Settings: To ensure consistency and accuracy in our study, we defined a canonical viewpoint for objects in each category. We used elevation and azimuth angles to describe changes in viewpoint when observing objects from different perspectives. In this setup, both the canonical elevation and azimuth angles were set to 0 degrees, with slight adjustments made to the distance from the camera to the object to optimize the diversity and coverage of viewpoints. This approach’s advantage lies in standardizing the input data and facilitating consistent reconstruction results across multiple viewpoints. For network training, we employed the Adam optimizer. The training process was divided into two phases: the initial learning rate was set at 1 for the first phase, decaying by a factor of 0.5 every 500 training epochs. For the second phase, the learning rate was set at 5e-5, with a similar decay every 500 epochs. This learning rate adjustment strategy aids in rapid convergence in the early stages and maintains model stability later on. Furthermore, we sampled 8,192 points for each object to ensure adequate data density and coverage, thereby enhancing the accuracy of reconstruction quality and details. The hyperparameters were set as = 1, = 0.1, = 0.01.
Ablation study
Tables 5 and 6 present detailed results from ablation studies on several key designs within our method. By analyzing these components, we gain deep insights into each part’s impact on the overall model performance. Initially, we examined cases using only contour loss (SIL), which showed good reconstruction effects but lacked sufficient expression of symmetry in some models. This indicates that while contour loss effectively captures the outline features of sketches, it falls short in capturing the models’ symmetry aspects. Introducing strong symmetry loss (SYM) significantly improved the reconstruction model’s performance, especially in terms of symmetry representation, underscoring the importance of symmetry loss in our approach, particularly for objects with notable symmetry. Finally, incorporating the symmetry strength measure score (SSM) further enhanced model performance. This improvement demonstrates that by introducing the symmetry strength measure score, our method can more flexibly handle various symmetries, thus improving overall reconstruction precision and quality while maintaining model symmetry. Additionally, the effects of our visualization ablation studies are illustrated in Fig. 5, showcasing the impact of different components on the reconstruction quality.
Comparative experiments
We compared our method against two leading technologies Sketch2Model and Deep3DSketch to ensure fairness in comparison, training a separate model for each category in the dataset. Detailed comparison results can be found in Tables 1 and 2. All methods were trained and tested on the same dataset to maintain assessment consistency. We focused on evaluating each method’s performance in handling objects with complex geometric structures and sketches of different styles. Performance evaluation was based primarily on two key metrics: Chamfer Distance and Voxel IoU, which together assess the reconstructed model’s precision and quality. Chamfer Distance measures the geometric difference between the predicted and actual models, while Voxel IoU reflects the shape similarity at the voxel level. Experimental results show our method’s significant advantage in most categories, especially in terms of reconstruction precision, as illustrated in Fig. 3. This outcome validates our method’s efficiency in deciphering complex geometric information in sketches, particularly for sketches with complex details and diverse styles. To further demonstrate the superiority of our method, we conducted additional comparative experiments on the ShapeNet-Sketch dataset. The results of these comparative tests are detailed in Tables 3 and 4, highlighting our method’s exceptional capability and efficiency in processing diverse sketch data. Moreover, we have visually presented the test results on the ShapeNet-Sketch dataset in Fig. 4, further validating the effectiveness of our method in 3D model reconstruction.
Conclusions
This study introduces an efficient method for single-view 3D reconstruction from sketches, effectively transforming hand-drawn sketches into precise 3D shapes. Through an encoder–decoder architecture, our method accurately captures the essential features of sketches and enhances reconstruction quality through model generation and optimization stages. Specifically, contour optimization ensures the reconstructed model’s contours align with the original sketch, while symmetry loss and symmetry strength measure significantly boost the model’s accuracy and adaptability. Our experimental results demonstrate superior performance on a variety of sketches, especially in handling complex structures. Overall, this work not only advances the performance of sketch-based 3D reconstruction techniques but also opens new directions for future research in this field. Future work could further explore this foundation to enhance the efficiency and accuracy of 3D reconstruction.
Author Contributions
Y.Z. and Y.Q.H designed the research. Y.Q.H and J.Q.Z processed the data and performed the experiments. Y.Q.H, J.Q.Z, L.B and J.L drafted the manuscript. All authors viewed and contributed to the manuscript.
Funding
This paper is supported by the Public Welfare Research Program of Huzhou Science and Technology Bureau (2022GZ01)
Data availability
No datasets were generated or analyzed during the current study.
Declarations
Conflict of interest
The authors declare no competing interests.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Nazir, A; Cheema, MN; Sheng, B; Li, P; Kim, J; Lee, T-Y. Living donor-recipient pair matching for liver transplant via ternary tree representation with cascade incremental learning. IEEE Trans. Biomed. Eng.; 2021; 68,
2. Sra, M., Garrido-Jurado, S., Schmandt, C., Maes, P.: Procedurally generated virtual reality from 3d reconstructed physical space. In: Proceedings of the 22nd ACM Conference on Virtual Reality Software and Technology, pp. 191–200 (2016)
3. Wang, M; Lyu, X-Q; Li, Y-J; Zhang, F-L. VR content creation and exploration with deep learning: a survey. Comput. Vis. Media; 2020; 6, pp. 3-28. [DOI: https://dx.doi.org/10.1007/s41095-020-0162-z]
4. Lee, J., Gupta, M.: Blocks-world cameras. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11412–11422 (2021)
5. Sayed, N; Zayed, HH; Sharawy, MI. Arsc: augmented reality student card an augmented reality solution for the education field. Comput. Educ.; 2011; 56,
6. Zhang, S.-H., Guo, Y.-C., Gu, Q.-W.: Sketch2model: View-aware 3d modeling from single free-hand sketches. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6012–6021 (2021)
7. Sinha, S., Zhang, J.Y., Tagliasacchi, A., Gilitschenski, I., Lindell, D.B.: Sparsepose: Sparse-view camera pose regression and refinement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21349–21359 (2023)
8. Li, C; Pan, H; Bousseau, A; Mitra, NJ. Sketch2cad: sequential cad modeling by sketching in context. ACM Trans. Gr. (TOG); 2020; 39,
9. Xie, Z; Zhang, W; Sheng, B; Li, P; Chen, CP. Bagfn: broad attentive graph fusion network for high-order feature interactions. IEEE Trans. Neural Netw. Learn. Syst.; 2021; 34,
10. Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:1512.03012 (2015)
11. Bonnici, A; Akman, A; Calleja, G; Camilleri, KP; Fehling, P; Ferreira, A; Hermuth, F; Israel, JH; Landwehr, T; Liu, J et al. Sketch-based interaction and modeling: where do we stand?. AI EDAM; 2019; 33,
12. Olsen, L; Samavati, FF; Sousa, MC; Jorge, JA. Sketch-based modeling: a survey. Comput. Gr.; 2009; 33,
13. Plumed, R; Varley, PA; Company, P; Martin, R. Extracting datums to reconstruct CSG models from 2d engineering sketches of polyhedral shapes. Comput. Gr.; 2022; 102, pp. 349-359. [DOI: https://dx.doi.org/10.1016/j.cag.2021.10.013]
14. Tanaka, M; Terano, M; Asano, T; Higashino, C. Method to automatically convert sketches of mechanical objects into 3d models. Comput. Aided Design Appl.; 2020; 17,
15. Camba, JD; Company, P; Naya, F. Sketch-based modeling in mechanical engineering design: current status and opportunities. Comput. Aided Des.; 2022; 150,
16. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: Deepsdf: Learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
17. Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7708–7717 (2019)
18. Liu, S., Saito, S., Chen, W., Li, H.: Learning to infer implicit surfaces without 3d supervision. Adv. Neural Inf. Process. Syst. 32 (2019)
19. Kato, H., Ushiku, Y., Harada, T.: Neural 3d mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)
20. Lin, C-H; Wang, C; Lucey, S. Sdf-srn: learning signed distance 3d object reconstruction from static images. Adv. Neural. Inf. Process. Syst.; 2020; 33, pp. 11453-11464.
21. Chen, Z; Qiu, G; Li, P; Zhu, L; Yang, X; Sheng, B. Mngnas: distilling adaptive combination of multiple searched networks for one-shot neural architecture search. IEEE Trans. Pattern Anal. Machine Intell.; 2023; 45, pp. 13489-13508.
22. Leung, B., Ho, C.-H., Vasconcelos, N.: Black-box test-time shape refinement for single view 3d reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4080–4090 (2022)
23. Chen, D.-Y., Tian, X.-P., Shen, Y.-T., Ouhyoung, M.: On visual similarity based 3d model retrieval. In: Computer Graphics Forum, vol. 22, pp. 223–232. Wiley Online Library (2003)
24. Wang, F., Kang, L., Li, Y.: Sketch-based 3d shape retrieval using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1875–1883 (2015)
25. Sangkloy, P; Burnell, N; Ham, C; Hays, J. The sketchy database: learning to retrieve badly drawn bunnies. ACM Trans. Gr. (TOG); 2016; 35,
26. Nishida, G; Garcia-Dorado, I; Aliaga, DG; Benes, B; Bousseau, A. Interactive sketching of urban procedural models. ACM Trans. Gr. (TOG); 2016; 35,
27. Giunchi, D., James, S., Steed, A.: 3d sketching for interactive model retrieval in virtual reality. In: Proceedings of the Joint Symposium on Computational Aesthetics and Sketch-Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering, pp. 1–12 (2018)
28. Huang, H; Kalogerakis, E; Yumer, E; Mech, R. Shape synthesis from sketches via procedural models and convolutional networks. IEEE Trans. Visual Comput. Gr.; 2016; 23,
29. Nie, W-Z; Ren, M-J; Liu, A-A; Mao, Z; Nie, J. M-gcn: multi-branch graph convolution network for 2d image-based on 3d model retrieval. IEEE Trans. Multimed.; 2020; 23, pp. 1962-1976. [DOI: https://dx.doi.org/10.1109/TMM.2020.3006371]
30. Al-Jebrni, AH; Ali, SG; Li, H; Lin, X; Li, P; Jung, Y; Kim, J; Feng, DD; Sheng, B; Jiang, L et al. Sthy-net: a feature fusion-enhanced dense-branched modules network for small thyroid nodule classification from ultrasound images. Vis. Comput.; 2023; 39,
31. Guo, H; Sheng, B; Li, P; Chen, CP. Multiview high dynamic range image synthesis using fuzzy broad learning system. IEEE Trans. Cybern.; 2019; 51,
32. Zhou, Y; Chen, Z; Li, P; Song, H; Chen, CP; Sheng, B. Fsad-net: feedback spatial attention dehazing network. IEEE Trans. Neural Netw. Learn. Syst.; 2022; 34, pp. 7719-7733. [DOI: https://dx.doi.org/10.1109/TNNLS.2022.3146004]
33. Gadelha, M., Wang, R., Maji, S.: Shape reconstruction using differentiable projections and deep priors. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–30 (2019)
34. Zang, Y., Ding, C., Chen, T., Mao, P., Hu, W.: Deep3dsketch++: High-fidelity 3d modeling from single free-hand sketches. In: 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1537–1542. IEEE (2023)
35. Hu, X., Zhu, F., Liu, L., Xie, J., Tang, J., Wang, N., Shen, F., Shao, L.: Structure-aware 3d shape synthesis from single-view images. In: BMVC, p. 230 (2018)
36. Chen, T., Fu, C., Zhu, L., Mao, P., Zhang, J., Zang, Y., Sun, L.: Deep3dsketch: 3d modeling from free-hand sketches with view-and structural-aware adversarial training. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
37. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5939–5948 (2019)
38. Yu, A., Ye, V., Tancik, M., Kanazawa, A.: pixelnerf: Neural radiance fields from one or few images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4578–4587 (2021)
39. Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.: Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3504–3515 (2020)
40. Huang, Q; Wang, H; Koltun, V. Single-view reconstruction via joint analysis of image and shape collections. ACM Trans. Gr.; 2015; 34,
41. Choy, C.B., Xu, D., Gwak, J., Chen, K., Savarese, S.: 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pp. 628–644. Springer (2016)
42. Kar, A., Häne, C., Malik, J.: Learning a multi-view stereo machine. Adv. Neural Inf. Process. Syste.30 (2017)
Copyright Springer Nature B.V. Jan 2025