An Approach to Information Retrieval Based on Statistical Model Selection

Repository

An Approach to Information Retrieval Based on Statistical Model Selection

Show full record

Title: An Approach to Information Retrieval Based on Statistical Model Selection
Author: Efron, Miles
Abstract: Building on previous work in the field of language modeling information retrieval (IR), this paper proposes a novel approach to document ranking based on statistical model selection. The proposed approach offers two main contributions. First, we posit the notion of a document's "null model," a language model that conditions our assessment of the document model's significance with respect to the query. Second, we introduce an information-theoretic model complexity penalty into document ranking. We rank documents on a penalized log-likelihood ratio comparing the probability that each document model generated the query versus the likelihood that a corresponding "null" model generated it. Each model is assessed by the Akaike information criterion (AIC), the expected Kullback-Leibler divergence between the observed model (null or non-null) and the underlying model that generated the data. We report experimental results where the model selection approach offers improvement over traditional LM retrieval.
Department: Information, School of
Subject: model selection
URI: http://hdl.handle.net/2152/414
Date: 2008-08

Files in this work

Download File: pll.pdf
Size: 1.843Mb
Format: application/pdf

This work appears in the following Collection(s)

Show full record


Advanced Search

Browse

My Account

Statistics

Information