Content area

Abstract

Scene graphs encode relationships between image entities as triplets (subject-relationship-object), where nodes represent grounded entities and directed edges define relationships from the subject to the object. The Scene Graph Generation (SGG) task faces significant challenges, including difficulty detecting small or occluded entities and classifying entities and relationships due to imbalanced class distributions and ambiguous annotations. As a result, SGG models often suffer from low accuracy and a bias toward frequently occurring classes. Existing methods employ techniques such as re-weighting training samples or post-processing inference results to mitigate the bias. However, these approaches often compromise overall accuracy, as they trade off general model performance for a more balanced class distribution.

In this thesis, we leverage prior knowledge of scene graph triplets to enhance accuracy and mitigate bias in trained SGG models in a principled manner. We propose a Bayesian Network (BN) to capture the stable within-triplet prior and a Conditional Random Field (CRF) to model the between-triplet prior of scene graph triplets. BN inference, when applied to uncertain evidence from a biased SGG model, improves the overall accuracy as well as mitigates bias. The CRF further refines predictions by integrating unary potentials derived from the BN posterior with pairwise potentials, representing the between-triplet prior learned from triplet co-occurrence statistics.

Beyond improving performance in static scene graphs, we explore the challenge of integrating both static and temporal potentials in Dynamic Scene Graph (DSG) generation. Existing methods implicitly assume that all relationships in DSG are purely temporal, neglecting their static components. To address this, we propose a Transformer-based CRF model that effectively captures both static and long-term temporal potentials, demonstrating its superiority over traditional Transformer-based approaches.

Finally, we showcase the effectiveness of scene graphs as a bridge for Visual Question Answering (VQA). Prior works on SG-based VQA assume that every question can be answered solely from the perfect scene graph, leading to poor performance on questions unrelated to the scene graph. To overcome this limitation, we introduce an uncertainty-guided approach that combines predictions from two Bayesian ensembles: one for image-based VQA and another for SG-based VQA, ensuring more robust and accurate question answering.

Details

1010268
Title
Probabilistic Scene Graph Generation and Its Applications
Number of pages
151
Publication year
2025
Degree date
2025
School code
0185
Source
DAI-B 86/12(E), Dissertation Abstracts International
ISBN
9798280713369
Advisor
Committee member
Radke, Richard J.; Yazici, Birsen; Lai, Rongjie
University/institution
Rensselaer Polytechnic Institute
Department
Electrical Engineering
University location
United States -- New York
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31934168
ProQuest document ID
3216675606
Document URL
https://www.proquest.com/dissertations-theses/probabilistic-scene-graph-generation-applications/docview/3216675606/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic