Content area

Abstract

In today's digital landscape, users often face challenges accessing relevant information scattered across multiple disparate sources. For example, a healthcare professional might need patient records, medical research papers, and pharmaceutical data stored in separate databases. Such fragmentation causes inefficiencies and delays in critical decisions, highlighting the need for systems that seamlessly aggregate information from diverse repositories.

Federated Search (Distributed Information Retrieval) addresses this by allowing users to submit one query and receive integrated results from multiple heterogeneous resources without central indexing. These systems distribute search processes across various nodes, improving scalability and efficiency, particularly valuable in academic journals, enterprise systems, and deep web segments inaccessible to standard search engines.

Existing federated search techniques primarily employ term-based statistical methods like query-based sampling and resource descriptions using term distributions. However, these approaches often inadequately capture complex semantic relationships and the internal diversity of resources. Representing resources as single-point embeddings further limits the precision of retrieval due to insufficient semantic representation.

This dissertation investigates advanced representation learning methods to enhance resource selection in federated search. Specifically, it addresses the limitations of current approaches by modeling the semantic diversity within resources and intricate query-resource relationships. Building upon previous studies, we integrate sophisticated resource modeling with advanced neural architectures, including Graph Neural Networks (GNNs) and pre-trained language models, to improve retrieval accuracy.

We further propose representing resources as hyperrectangular boxes in latent space, offering richer semantic depiction than single-point embeddings, effectively capturing the internal diversity and varying relevance of resource content. Recognizing the evolving nature of resources and user interests, we incorporate temporal dynamics and user interaction patterns, enabling adaptive and contextually relevant retrieval.

Finally, we introduce FedRAGraph, a novel federated search method combining Retrieval-Augmented Generation (RAG) with graph-based indexing and hierarchical community detection. FedRAGraph constructs detailed knowledge graphs, segments them into semantically coherent communities, and generates hierarchical summaries via large language models, achieving superior precision, context-awareness, and iterative reasoning capabilities compared to state-of-the-art methods.

Details

1010268
Title
Resource Selection in Federated Search
Author
Number of pages
139
Publication year
2025
Degree date
2025
School code
0792
Source
DAI-A 86/12(E), Dissertation Abstracts International
ISBN
9798280753938
Advisor
Committee member
Dragut, Eduard C.; Zhang, Zhongfei Mark; Xi, Zhaohan; Qiao, Xingye
University/institution
State University of New York at Binghamton
Department
Computer Science
University location
United States -- New York
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32042176
ProQuest document ID
3217382906
Document URL
https://www.proquest.com/dissertations-theses/resource-selection-federated-search/docview/3217382906/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic