You can submit standard HTTP requests to the IDOL ACI and Service ports to get information about the status of your IDOL components, whether individual components are running, and so on. You can use this information for manual analysis, or to configure automated monitoring tools such as Nagios, which can send and check HTTP responses using the check_http
plugin (for more information, refer to the Nagios documentation).
In addition to submitting actions, you can also use the Overview tab on the Status page of IDOL Admin to view information about the status and availability of your components.
You can also use statistics to monitor your components.
You can use the following methods to check the availability of your components.
To check that a component is running, submit an /action=GetStatus
request to the service port of the component. If the component is running, the response body includes <RESPONSE>Running</RESPONSE>
.
To check that an IDOL component’s ACI port is responding, and that the component has threads available to handle incoming ACI requests, send an /action=GetPID
request. If the ACI port is responding normally, the response body includes <response>SUCCESS</response>
.
This action is available on all ACI servers, but is particularly relevant for components with ACI threads that are processing long-running requests (for example, Content or DAH).
The advantage of this method is that it has very little impact on processing time, performance impact, and network overhead.
To check the status of ACI threads, use the ACIThreadStatus
service port action. The action returns information on the current state of all ACI threads, even if the ACI port is unavailable (for example, if all threads are servicing long-running actions).
You can use this action to detect and flag up long-running queries in real time.
The action returns the following information:
Whether a thread is in progress (that is, currently servicing a request).
Whether a thread is idle.
The number of accepted connections (ACI requests which have been accepted, but cannot yet be processed because no ACI thread is available to handle them).
In the case of active threads, the action that is being processed, and the currently elapsed time.
This method is relevant to components with an index port (for example, Content and DIH).
To check that the index port can be contacted and is responding to incoming requests, send the /PING?
request to the index port. If the index port is running normally, this returns the string PING SUCCESS
.
This method is relevant to components with children (for example, IDOL Proxy, DIH, and DAH).
To check that child components are running normally, submit the ACI GetStatus
action. The response contains the following information on the current state of the child components:
IDOL Proxy: The component/componentname/status
response is RUNNING
if the child component is running normally.
Because the Proxy component allocates port numbers to its children, it might be easier to monitor their state by using the Proxy, rather than by contacting the children directly.
DIH: The engines/engine/status
response is UP
if the engine is running normally. The response reports the last-known state of the engine (that is, it might be cached, with a maximum age dependent on the configured ping time).
DAH: The engines/engine/status
response is ONLINE
if the engine is running and queryable. DAH collects current information from its children when it receives a GetStatus
request, so child statuses are current; however, if one or more children are down, this might incur a timeout penalty when attempting to connect.
The IDOL Content component’s GetStatus
action provides specific information that can provide an indication of the status of the component. For example:
If the number of documents
is much smaller than the number of committed_documents
, you might need to compact the engine.
If the number of total_terms
increases suddenly, or is significantly higher than in comparable engines, this might indicate the presence of documents containing many “junk” terms. This might be because the documents have been incorrectly extracted, or because the OCR process has been unsuccessful. (You can use the TermGetAll
action to investigate terms with unusually low or high document occurrences.)
If the number of fieldcodes increases suddenly, or is significantly higher than in comparable engines, this might indicate the presence of documents containing many “junk” metadata fields.
Unexpected values for mindatestring
or maxdatestring
might indicate that the engine’s DateFormatCSVs
parameter is misconfigured, or that incoming documents lack date information, or contain dates and times in an unexpected format.
You can check the timestamp and outcome of the last validation run on various indexes, as well as the timestamp for the last run-time error (that is, any error returned from that index during the course of a query or index command).
|