Full Text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In the evolving field of artificial intelligence, fine-tuning diffusion models is crucial for generating contextually coherent digital characters across various media. This paper examines four advanced fine-tuning techniques: Low-Rank Adaptation (LoRA), DreamBooth, Hypernetworks, and Textual Inversion. Each technique enhances the specificity and consistency of character generation, expanding the applications of diffusion models in digital content creation. LoRA efficiently adapts models to new tasks with minimal adjustments, making it ideal for environments with limited computational resources. It excels in low VRAM contexts due to its targeted fine-tuning of low-rank matrices within cross-attention layers, enabling faster training and efficient parameter tweaking. DreamBooth generates highly detailed, subject-specific images but is computationally intensive and suited for robust hardware environments. Hypernetworks introduce auxiliary networks that dynamically adjust the model’s behavior, allowing for flexibility during inference and on-the-fly model switching. This adaptability, however, can result in slightly lower image quality. Textual Inversion embeds new concepts directly into the model’s embedding space, allowing for rapid adaptation to novel styles or concepts, but is less effective for precise character generation. This analysis shows that LoRA is the most efficient for producing high-quality outputs with minimal computational overhead. In contrast, DreamBooth excels in high-fidelity images at the cost of longer training. Hypernetworks provide adaptability with some tradeoffs in quality, while Textual Inversion serves as a lightweight option for style integration. These techniques collectively enhance the creative capabilities of diffusion models, delivering high-quality, contextually relevant outputs.

Details

Title
Advancing Persistent Character Generation: Comparative Analysis of Fine-Tuning Techniques for Diffusion Models
Author
Martini, Luca 1   VIAFID ORCID Logo  ; Iacono, Saverio 2   VIAFID ORCID Logo  ; Zolezzi, Daniele 1   VIAFID ORCID Logo  ; Vercelli, Gianni Viardo 2   VIAFID ORCID Logo 

 Department of Languages and Modern Culture, University of Genova, 16145 Genova, Italy; [email protected] 
 Department of Computer Science and Technology, Bioengineering, Robotics and Systems Engineering, University of Genova, 16145 Genova, Italy; [email protected] (S.I.); 
First page
1779
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
26732688
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3149498942
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.