Set TermAnalysis
to True
to return a summary of the counts of terms in different classes. It returns the following information:
Terms. The total number of terms.
Numeric. The number of purely numeric terms.
Alphanumeric. The number of alphanumeric terms (this count excludes purely numeric terms).
Multibyte. The number of terms that include at least one multi-byte character.
Dococcs logn. The number of terms that contain the associated number of document occurrences.
Length len. The number of terms of each length.
DistinctTermsPerDoc logn. The number of documents that contain the associated number of distinct terms.
TermsPerDoc logn. The number of documents that contain the associated number of terms.
NOTE: Logn=N
means log (base 2) of N
. For example:
Logn=0 means items that have 1 (20) of this property (for example, documents with only 1 distinct term).
Logn=1 means items that have 2 (21) of this property.
Logn=2 means items that have 3-4 (between 21 and 22) of this property.
Logn=3 means items that have 5-8 (between 22 and 23) of this property.
Actions: | TermGetAll |
Type: | Boolean |
Default: | False |
Example: | TermAnalysis=True
|
See Also: |
|