Content area

Abstract

This exploratory study investigates the usability of performance metrics for generative adversarial network (GAN)-based models for speech-driven facial animation. These models focus on the transfer of speech information from an audio file to a still image to generate talking-head videos in a small-scale “everyday usage” setting. Two models, LipGAN and a custom implementation of a Wasserstein GAN with gradient penalty (L1WGAN-GP), are examined for their visual performance and scoring according to commonly used metrics: Quantitative comparisons using FID, SSIM, and PSNR metrics on the GRIDTest dataset show mixed results, and metrics fail to capture local artifacts crucial for lip synchronization, pointing to limitations in their applicability for video animation tasks. The study points towards the inadequacy of current quantitative measures and emphasizes the continued necessity of human qualitative assessment for evaluating talking-head video quality.

Details

1009240
Business indexing term
Title
All’s Well That FID’s Well? Result Quality and Metric Scores in GAN Models for Lip-Synchronization Tasks
Author
Geldhauser Carina 1   VIAFID ORCID Logo  ; Liljegren Johan 2 ; Nordqvist Pontus 2 

 Department Mathematik, ETH Zurich, 8092 Zurich, Switzerland 
 Centre for Mathematical Sciences, Lund University, P.O. Box 118, 22100 Lund, Sweden 
Publication title
Volume
14
Issue
17
First page
3487
Number of pages
20
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-31
Milestone dates
2025-07-02 (Received); 2025-08-25 (Accepted)
Publication history
 
 
   First posting date
31 Aug 2025
ProQuest document ID
3249684759
Document URL
https://www.proquest.com/scholarly-journals/all-s-well-that-fid-result-quality-metric-scores/docview/3249684759/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-12
Database
ProQuest One Academic