Glossary
-
A technology layer that automates operations on unstructured information for cross-enterprise applications. ACI enables an automated and compatible business-to-business, peer-to-peer infrastructure. The ACI allows enterprise applications to understand and process content that exists in unstructured formats, such as email, Web pages, Microsoft Office documents, and IBM Notes.
-
A server component that runs on the Autonomy Content Infrastructure (ACI).
-
An ACL is metadata associated with a document that defines which users and groups are permitted to access the document.
-
A request sent to an ACI server.
-
A domain controller for the Microsoft Windows operating system, which uses LDAP to authenticate users and computers on a network.
-
The IDOL Server component that manages categorization and clustering.
-
The IDOL Server component that manages users and communities.
-
A grammar file that has been compiled from XML into ECR file format using the Eduction command-line tool edktool, so that Eduction can use it directly. See also: XML, ECR file, grammar, standard grammar, user grammar.
-
An attribute of a matched entity (a component of the single match), for example a topic or sentiment.
-
An IDOL component (for example File System Connector) that retrieves information from a local or remote repository (for example, a file system, database, or Web site).
-
Connector Framework Server processes the information that is retrieved by connectors. Connector Framework Server uses KeyView to extract document content and metadata from over 1,000 different file types. When the information has been processed, it is sent to an IDOL Server or Distributed Index Handler (DIH).
-
The IDOL Server component that manages the data index and performs most of the search and retrieval operations from the index.
-
DAH distributes actions to multiple copies of IDOL Server or a component. It allows you to use failover, load balancing, or distributed content.
-
An IDOL server data pool that stores indexed information. The administrator can set up one or more databases, and specifies how data is fed to the databases. By default IDOL server contains the databases Profile, Agent, Activated, Deactivated, News and Archive.
-
DIH allows you to efficiently split and index extremely large quantities of data into multiple copies of IDOL Server or the Content component. DIH allows you to create a scalable solution that delivers high performance and high availability. It provides a flexible way to batch, route, and categorize the indexing of internal and external content into IDOL Server.
-
ECR is a proprietary format for grammar files that Eduction can easily read at runtime. You can write grammar files in XML, then use the Eduction command-line tool edktool to compile them into ECR format. See also: XML, compiled grammar.
-
A command-line tool for compiling and testing Eduction grammars.
-
The process of extracting entities (patterns of text) from documents.
-
The part of any Eduction component that processes text and performs extraction and redaction operations. You can access the engine by using the Eduction SDK, Eduction Server, or an IDOL ingestion component (CFS or IDOL NiFi Ingest).
-
In Eduction, an entity is a word, phrase, or block of information that the Eduction component can match and extract from documents. An entity can be a specific text string, such as a name, or it can be a pattern of text such as an address or phone number. You define the pattern in a grammar, which Eduction uses to find the entities in documents.
-
Eduction extracts entities from documents based on the rules you have created in your dictionaries and grammars, and returns an XML list of matches, or adds the matches to the source document as new fields. See also: XML, grammar, dictionary.
-
Fields define different parts of content in IDOL documents, such as the title, content, and metadata information.
-
In Eduction, a grammar is a pattern that defines an entity.
-
A word or short phrase that Eduction matches in an entity (for example, the name of a person or place).
-
The Intelligent Data Operating Layer (IDOL) Server, which integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content. It delivers a real-time environment in which operations across applications and content are automated.
-
An IDOL Server component that accepts incoming actions and distributes them to the appropriate subcomponent. IDOL Proxy also performs some maintenance operations to make sure that the subcomponents are running, and to start and stop them when necessary.
-
A structured file format that can be indexed into IDOL server. You can use a connector to import files into this format or you can manually create IDX files.
-
After a document has been downloaded from the repository in which it is stored, it is imported to an IDX or XML file format. This process is called “importing”.
-
The IDOL server data index contains document content and field information for analysis and retrieval.
-
The process of storing data in IDOL server. IDOL server stores data in different field types (such as, index, numeric and ordinary fields). It is important to store data in appropriate field types to ensure optimized performance.
-
An integrated security solution to protect your data. At the front end, authentication checks that users are allowed to access the system that contains the result data. At the back end, entitlement checking and authentication combine to ensure that query results contain only documents that the user is allowed to see, from repositories that the user has permission to access. For more information, refer to the IDOL Document Security Administration Guide.
-
The IDOL component that extracts data, including text, metadata, and subfiles from over 1,000 different file types. KeyView can also convert documents to HTML format for viewing in a Web browser.
-
A value that identifies a particular entity, without being a part of the entity value. For example, the phrase "Date of Birth" is a landmark for an entity that extracts dates of birth.
-
Lightweight Directory Access Protocol. Applications can use LDAP to retrieve information from a server. LDAP is used for directory services (such as corporate email and telephone directories) and user authentication. See also: active directory, primary domain controller.
-
License Server enables you to license and run multiple IDOL solutions. You must have a License Server on a machine with a known, static IP address.
-
See sentiment analysis.
-
An embedded scripting language that you can use to write custom scripts to expand certain IDOL functionality.
-
A formula used to validate identification numbers, such as credit card numbers and social security numbers. The formula checks for errors by performing mathematical operations in the number to calculate a number that must agree with the final digit of the number.
-
Data that describes and gives information about other data. For example, the metadata for a text document might include information about the author of the document, the date it was written, or a short summary.
-
A server that manages access permissions for your users. It communicates with your repositories and IDOL Server to apply access permissions to documents.
-
The process of analyzing text according to the rules of a formal grammar.
-
A description of the entity you want to extract, which enables Eduction to produce a list of matches based on that pattern. Usually, a pattern specifies in general terms what a match looks like (for example, phone numbers), by using regular expressions. You can also use it to specify an exact list, but in this case you usually use headwords. See also: entity, extraction, grammar, headword, regular expressions.
-
A number, usually between 0.50 and 1.50, that represents the strength of the sentiment in the matched phrase.
-
A script that performs additional processing on matched entities. This script can validate matches (for example to calculate a checksum for an ID number), and discard matches if they do not meet the script requirements.
-
Precision is the percentage of extracted entities that are true entities. See also: recall.
-
A server computer in a Microsoft Windows domain that controls various computer resources. See also: active directory, LDAP.
-
The recall of an extraction is the percentage of matches that are actually returned, out of the total number of matches that should return in theory. See also: precision.
-
A string that allows you to define a particular string pattern in a concise format. Matching in Eduction uses regular expressions to define what you want to match.
-
The similarity that a particular query result has to the initial query. IDOL Server assigns results a percentage relevance score according to how closely it matches the query criteria.
-
A form of Eduction that identifies positive and negative sentiment in text.
-
Eduction includes a set of standard grammars that allow you to extract the most common entities, such as person, place, or company names, legal terms, addresses, dates, and times. See also: entity, compiled grammar, grammar, user grammar.
-
The process of adding extra information to documents. The tag might be a category, or entities returned from Eduction. Tagging usually adds a field to a document, which you can use to search by the name of a tag.
-
The process of analyzing text to split it into tokens. See Also: tokens
-
IDOL Server stores document text as a series of tokens. Generally, a token is a word, but it can also include other strings of characters (such as a phone number or e-mail address).
-
XML files created by the user that describe entities that can locate patterns in text using the Eduction grammar language.
-
An IDOL component that converts files in a repository to HTML formats for viewing in a Web browser.
-
A character that stands in for any character or group of characters in a query.
-
Extensible Markup Language. XML is a language that defines the different attributes of document content in a format that can be read by humans and machines. In IDOL Server, you can index documents in XML format. IDOL Server also returns action responses in XML format.
A
C
D
E
F
G
H
I
K
L
M
O
P
R
S
T
U
V
W
X