By Gerard Salton
Provides a concept of indexing able to rating index phrases, or topic identifiers in reducing order of significance. This ends up in the alternative of excellent rfile representations, and likewise bills for the position of words and of word list periods within the indexing technique.
This examine is standard of theoretical paintings in automated details association and retrieval, in that ideas are used from arithmetic, desktop technological know-how, and linguistics. a whole idea of details retrieval may perhaps emerge from a suitable blend of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Similar probability books
Montgomery and Runger's bestselling engineering statistics textual content presents a realistic process orientated to engineering in addition to chemical and actual sciences. by way of delivering detailed challenge units that mirror practical occasions, scholars find out how the cloth could be proper of their careers. With a spotlight on how statistical instruments are built-in into the engineering problem-solving procedure, all significant facets of engineering information are coated. built with sponsorship from the nationwide technology beginning, this article accommodates many insights from the authors' instructing adventure in addition to suggestions from quite a few adopters of earlier variants.
The ebook offers a radical improvement of the trendy thought of stochastic approximation or recursive stochastic algorithms for either restricted and unconstrained difficulties. there's a entire improvement of either chance one and vulnerable convergence equipment for terribly basic noise tactics. The proofs of convergence use the ODE approach, the main robust to this point, with which the asymptotic habit is characterised by way of the restrict habit of an average ODE.
During this quantity, major specialists in experimental in addition to theoretical physics (both classical and quantum) and chance conception provide their perspectives on many interesting (and nonetheless mysterious) difficulties in regards to the probabilistic foundations of physics. the issues mentioned in the course of the convention contain Einstein-Podolsky-Rosen paradox, Bell's inequality, realism, nonlocality, function of Kolmogorov version of chance thought in quantum physics, von Mises frequency idea, quantum details, computation, "quantum results" in classical physics.
- Model Selection and Model Averaging
- Probability and Statistics for Engineers and Scientists 3e Solutions
- Probability and Partial Differential Equations in Modern Applied Mathematics
- Seminaire de Probabilites XX
- Seminaire de Probabilites XVII 1981 82
Extra resources for A Theory of Indexing
Let t be the total number of distinct terms assigned to the documents, n be the total number of documents, K be the average length of the document vectors (that is, the average number of nonzero terms), and K' be the average document frequency of a term (that is, the average number of documents to which a term is assigned). In increasing order of difficulty, the following computational requirements become necessary: for the weighting system based on collection or document frequencies (formulas (4) and (5)), K' additions are needed per term; for t terms, this produces K't additions.
A summarization of the complexity of the significance computations is given in Table 6. Since the discrimination value measure is dependent on the collection G. SALTON 26 TABLE 6 Computational complexity of significance computations Significance Overall order Computa tional requirements measure F or B (multiplications) K't additions EK (2K' + l)t (K1 + 2)t additions multiplications S/N (2K' + l)t 3K't 2K't additions multiplications logarithms o(3K't) (2Kn + 4» + 2)t + 2Kn + 2n multiplications (2Kn + n -f 3)t + 2Kn + n additions (n + \)t square roots o(2Knt) DV — o(K't) size, the calculations become automatically much more demanding than those required for the other measures.
The resulting thesaurus classes are not directly comparable to classes obtained by using only the low frequency terms for clustering purposes. However, the experimental recall-precision results may be close to those produced by the alternative, possibly preferred, methodology. A THEORY OF INDEXING 51 The document frequency cutoff actually used for deciding on inclusion of a given term in the experimental thesauruses was 19, 15, and 19 for the CRAN, MED, and Time collections respectively; that is, terms with document frequencies smaller than or equal to the stated frequencies were included.