When you deploy connectors and CFS, check that the data is extracted and processed as expected. If necessary, you can customize the documents produced by CFS.
CFS includes an application (filtertest.exe
) that filters a file using KeyView, and writes the output to a text file. You can take a sample file from your repository and see how the file is processed. This test can provide useful information about how you might need to configure your CFS.
When you start extracting data, configure CFS to output documents as files (you can output documents as XML or IDX files). You can then see how the data from a repository is used to build documents, and verify that the data is processed as expected.
For example:
Check that the data you want to extract has been extracted successfully.
Check that the data has been written to the correct fields. The field names that you use to store extracted data are particularly important for some IDOL operations, for example Parametric Search .
Check that the content is valid. If you notice that documents contain hexadecimal codes (for example 0x456
), it is likely that the original characters have been escaped.
Text returned by KeyView is encoded as UTF-8. KeyView attempts to determine the encoding of files that are processed, but if it cannot determine the correct encoding, CFS escapes any characters that are not valid UTF-8.
When testing connectors and CFS, you might want to index the same files many times and observe how the data is processed. However, connectors keep a recordof the files that have been retrieved. A connector creates a datastore (.db
) file for each fetch task that you run. To retrieve the same files again you must delete this record.
You can delete the datastore (.db
) files in the connector installation folder, or set configuration parameters in the connector’s configuration file so that the connector does this automatically (see Configuration Parameters for Testing Connectors).
|