TermAnalysis

Set TermAnalysis to True to return a summary of the counts of terms in different classes. It returns the following information:

  • Terms. The total number of terms.

  • Numeric. The number of purely numeric terms.

  • Alphanumeric. The number of alphanumeric terms (this count excludes purely numeric terms).

  • Multibyte. The number of terms that include at least one multi-byte character.

  • Dococcs logn. The number of terms that contain the associated number of document occurrences.

  • Length len. The number of terms of each length.

  • DistinctTermsPerDoc logn. The number of documents that contain the associated number of distinct terms.

  • TermsPerDoc logn. The number of documents that contain the associated number of terms.

NOTE: Logn=N means log (base 2) of N. For example:

    • Logn=0 means items that have 1 (20) of this property (for example, documents with only 1 distinct term).

    • Logn=1 means items that have 2 (21) of this property.

    • Logn=2 means items that have 3-4 (between 21 and 22) of this property.

    • Logn=3 means items that have 5-8 (between 22 and 23) of this property.

Actions: TermGetAll
Type: Boolean
Default: False
Example: TermAnalysis=True
See Also: