Inverse Document Frequency

The Spärck Jones / Robertson IDF page

In 1972, Karen Spärck Jones published in the Journal of Documentation the paper which defined the term weighting scheme now known as inverse document frequency (IDF). This was reprinted in 2004 as part of a celebration of 60 years of the Journal. In the same 2004 issue, Stephen Robertson wrote an analysis of the theoretical basis for idf, and Karen wrote a reply.

This page links to copies of all the above, plus an interchange by letter shortly after the first publication. We are grateful to Emerald for permission to make these items available.

The original exchange in 1972 was part of the stimulus for the development (via a short paper [1] in 1974) for the Robertson/Spärck Jones relevance weighting model of 1976 [2]. However, the circle was not fully closed until the Croft/Harper paper of 1979 [3] which showed IDF as an approximation to RSJ relevance weighting, together with a much later paper [4] which clarified the difference between the Croft/Harper approximation and the original formula. A short technical report [5] summarises the text retrieval methods developed in this framework, and a comprehensive paper [6] covers the combination of IDF weighting with other weighting factors and reports extensive experimental results.

Stephen Robertson
February 2005; revised March 2006