Content area
Over the past few decades, the IR community have been making a continuous effort to improve the efficiency of search in large collections of documents. A lot of research has focused on the efficiency of search engine query processing, and in particular on the top-𝑘 document retrieval problem, which can be defined as reporting the 𝑘 most relevant documents from a collection for a given query. Disjunctive top-𝑘 queries over simple ranking functions are commonly used to retrieve an initial set of candidate results that are then re-ranked by more complex, often machine-learned, rankers.
In this thesis, we investigate the top-k document retrieval problem from several angles. The first problem we encounter is efficient index representation and traversal. In particular, compression of posting lists is of utmost importance, since they account for much of the data size and access costs. Likewise, the choice of retrieval algorithm is crucial to query efficiency. We attempt to address these issues by evaluating a number of state-of-the-art index compression methods and safe disjunctive DAAT query processing algorithms. Our goal is to understand how much index compression performance impacts overall query processing speed, and how the choice of query processing algorithm depends on the compression method used. Moreover, it is well known that a good assignment of IDs to documents can significantly improve index compression. Many query processing algorithms are also sensitive to this assignment. Thus, reordering document identifiers in inverted indexes can reduce storage costs and improve processing efficiency in the resulting indexes, though the trade-offs are not yet well understood. For this reason, we also assess how query performance is impacted by document reordering techniques.
The second contribution focuses on the top-𝑘 threshold estimation problem, where given a query 𝑞, the goal is to estimate the score of the result at rank 𝑘. A good estimate of this score can result in significant performance improvements for several query processing scenarios, including disjunctive query processing algorithms. Several approaches have been proposed, including parametric approaches, methods using random sampling, and a recent approach based on machine learning. However, previous work fails to perform any experimental comparison between these approaches. In this thesis, we address this issue by reimplementing four major approaches and comparing them in terms of estimation error, running time, likelihood of an overestimate, and end-to-end performance when applied to common classes of disjunctive top-𝑘 query processing algorithms. Lastly, we propose several optimizations to improve the accuracy and efficiency of the threshold computation.
Due to the large sizes of most search collections, retrieving all potential matches is infeasible and undesirable. Most advanced query processing algorithms make use of dynamic pruning techniques, which allow them to skip document evaluation while being able to retrieve the top-𝑘 most relevant documents for a given query with little or no effectiveness degradation of its ranking. While the fastest methods achieve impressive results on top-10 and top-100 queries, they tend to become much slower for the larger 𝑘 commonly used for candidate generation. In our third contribution, we propose new algorithms that achieve much faster query processing for values of 𝑘 up to thousands or tens of thousands. Our algorithms are based on the so-called live-block filtering approach of Dimopoulos et al., which uses SIMD operations of modern CPUs on fine-grain block-max data to determine a relatively small set of live blocks, which are areas of the index that might contain top-𝑘 results. In particular, we propose one method that integrates the MaxScore algorithm into the live-block approach, and another that applies a novel SIMD-optimized version of brute-force Term-at-a-Time (TAAT) to live blocks. A detailed experimental comparison with the fastest known algorithms shows significant improvements for the new methods.
While query processing in search engines is a fairly complex process, most state-of-the-art systems appear to process a query by using a cascading pipeline, in which a first-stage model retrieves a candidate set of documents, and one or more subsequent stages re-rank this set using contextualized language models such as BERT. We propose a new document term-weighting scheme suitable for efficient retrieval under transformer-based ranking approaches while leveraging a standard inverted index. It makes use of document expansion and, utilizing a contextualized language model, directly estimates the semantic importance of tokens in a document, producing a single-value representation for each token in each document. This approach significantly outperforms prior first-stage retrieval approaches on effectiveness metrics and, when deployed in a re-ranking scenario, can reach the same effectiveness as state-of-the-art approaches with a significant speedup in efficiency.
Finally, modern Graphics Processing Units (GPUs), massively parallel architectures consisting of thousands of cores and high memory bandwidth, are becoming more widely used and can achieve superior performance on many compute-intensive applications. We see potential in exploring modern architectures, such as GPUs, for improving query latency by resorting to massive parallelism and faster context switching able to hide memory latency and avoid dependency stalls by overlapping thread execution. In this thesis, we explore efficient ways to perform GPU-based inverted index decompression, an initial, but fundamental, step of processing a query. The inverted index can be very large and compression is necessary to save the cost of storage, while being able to perform decompression on the GPU will avoid data transfer during the processing of the query. We propose two variations of existing compression techniques for CPUs that can achieve faster integer list decompression on GPUs, based on suitable modifications of the data layout and decompression steps that take into account the execution model and memory hierarchy of GPU architectures. This takes a initial step towards an efficient query processing algorithm on GPUs.