Download A Theory of Indexing by Gerard Salton PDF

By Gerard Salton

Provides a concept of indexing able to rating index phrases, or topic identifiers in reducing order of significance. This ends up in the alternative of excellent rfile representations, and likewise bills for the position of words and of word list periods within the indexing technique.

This examine is standard of theoretical paintings in automated details association and retrieval, in that ideas are used from arithmetic, desktop technological know-how, and linguistics. a whole idea of details retrieval may perhaps emerge from a suitable blend of those 3 disciplines.

Show description

Read or Download A Theory of Indexing PDF

Similar probability books

Applied Statistics and Probability for Engineers (5th Edition)

EISBN: 1118050177
eEAN: 9781118050170
ISBN-10: 0470053046
ISBN-13: 9780470053041

Montgomery and Runger's bestselling engineering statistics textual content presents a realistic process orientated to engineering in addition to chemical and actual sciences. by way of delivering detailed challenge units that mirror practical occasions, scholars find out how the cloth could be proper of their careers. With a spotlight on how statistical instruments are built-in into the engineering problem-solving procedure, all significant facets of engineering information are coated. built with sponsorship from the nationwide technology beginning, this article accommodates many insights from the authors' instructing adventure in addition to suggestions from quite a few adopters of earlier variants.

Stochastic approximation and recursive algorithms and applications

The ebook offers a radical improvement of the trendy thought of stochastic approximation or recursive stochastic algorithms for either restricted and unconstrained difficulties. there's a entire improvement of either chance one and vulnerable convergence equipment for terribly basic noise tactics. The proofs of convergence use the ODE approach, the main robust to this point, with which the asymptotic habit is characterised by way of the restrict habit of an average ODE.

Proceedings of the Conference Foundations of Probability and Physics: Vaxjo, Sweden, 25 November-1 December, 2000

During this quantity, major specialists in experimental in addition to theoretical physics (both classical and quantum) and chance conception provide their perspectives on many interesting (and nonetheless mysterious) difficulties in regards to the probabilistic foundations of physics. the issues mentioned in the course of the convention contain Einstein-Podolsky-Rosen paradox, Bell's inequality, realism, nonlocality, function of Kolmogorov version of chance thought in quantum physics, von Mises frequency idea, quantum details, computation, "quantum results" in classical physics.

Extra resources for A Theory of Indexing

Example text

Let t be the total number of distinct terms assigned to the documents, n be the total number of documents, K be the average length of the document vectors (that is, the average number of nonzero terms), and K' be the average document frequency of a term (that is, the average number of documents to which a term is assigned). In increasing order of difficulty, the following computational requirements become necessary: for the weighting system based on collection or document frequencies (formulas (4) and (5)), K' additions are needed per term; for t terms, this produces K't additions.

A summarization of the complexity of the significance computations is given in Table 6. Since the discrimination value measure is dependent on the collection G. SALTON 26 TABLE 6 Computational complexity of significance computations Significance Overall order Computa tional requirements measure F or B (multiplications) K't additions EK (2K' + l)t (K1 + 2)t additions multiplications S/N (2K' + l)t 3K't 2K't additions multiplications logarithms o(3K't) (2Kn + 4» + 2)t + 2Kn + 2n multiplications (2Kn + n -f 3)t + 2Kn + n additions (n + \)t square roots o(2Knt) DV — o(K't) size, the calculations become automatically much more demanding than those required for the other measures.

The resulting thesaurus classes are not directly comparable to classes obtained by using only the low frequency terms for clustering purposes. However, the experimental recall-precision results may be close to those produced by the alternative, possibly preferred, methodology. A THEORY OF INDEXING 51 The document frequency cutoff actually used for deciding on inclusion of a given term in the experimental thesauruses was 19, 15, and 19 for the CRAN, MED, and Time collections respectively; that is, terms with document frequencies smaller than or equal to the stated frequencies were included.

Download PDF sample

Rated 4.12 of 5 – based on 43 votes