TermAnalysis
Set TermAnalysis
to True
to return a summary of the counts of terms in different classes. It returns the following information:
-
Terms. The total number of terms.
-
Numeric. The number of purely numeric terms.
-
Alphanumeric. The number of alphanumeric terms (this count excludes purely numeric terms).
-
Multibyte. The number of terms that include at least one multi-byte character.
-
Dococcs logn. The number of terms that contain the associated number of document occurrences.
-
Length len. The number of terms of each length.
-
DistinctTermsPerDoc logn. The number of documents that contain the associated number of distinct terms.
-
TermsPerDoc logn. The number of documents that contain the associated number of terms.
NOTE: Logn=N
means log (base 2) of N
. For example:
-
Logn=0 means items that have 1 (20) of this property (for example, documents with only 1 distinct term).
-
Logn=1 means items that have 2 (21) of this property.
-
Logn=2 means items that have 3-4 (between 21 and 22) of this property.
-
Logn=3 means items that have 5-8 (between 22 and 23) of this property.
Actions: | TermGetAll |
Type: | Boolean |
Default: | False |
Example: | TermAnalysis=True
|
See Also: |