Creating, editing and deleting origins
This function allows you to import data from ControlPoint, Text files, Folders and Email Capture. For background information see Origins.
- On the Manage ribbon, click Origins.
The Origins - all dialogue box appears. - Right-click in the dialogue box and on the New menu, choose either:
- Text file
- Folder
- Mailbox on Exchange Server
- Mailbox on a Lotus Notes Server
- Mailbox on a Groupwise Server
- SharePoint Site
- XML File
- ControlPoint
- Email Capture
- Generic
- Generic (Manage in Place)
- Fill in the fields in the tabs described below and click OK to save the new origin.
Content Manager saves the new origin and it appears in the list with its source.
- On the Manage ribbon, click Origins.
The Origins - all dialogue box appears. - Right-click the origin to edit and click Properties.
The Origin - <NAME> dialogue box appears. - Edit the fields in the tabs described below and click OK.
Content Manager saves the changes to the origin.
It depends on the type of origin which tabs and fields are available.
- Name - name of the origin, e.g. Invoices
- Run On Computer - click the button to select the computer that Content Manager Bulk Loader should run on.
Depending on the number of records that Content Manager Bulk Loader will process, specify a computer that has enough processing resources to carry out the task within a timeframe acceptable to you.
Consider a server when you are expecting this computer to process regular large numbers of documents. It is difficult to give exact recommendations here as hardware specifications of computers and networks vary considerably. You should carry out tests in your specific environment to determine the setup to suit your needs.- Automatically start processing on startup - select to start processing documents in the selected folder as soon as Content Manager starts
- Enter the text file / Windows folder containing the data - click to select the input source.
The label depends on whether you are creating a new origin with a .txt file source or a folder source. - Managed-in-place document store - only available for Generic (Manage in Place) origins - click the KwikSelect to search for and select the Manage In Place (Windows File System) document store the Origin will ingest documents from.
Use this tab to set the default values that Content Manager should use for new records when it creates them from the origin source.
- Record Type - mandatory - every record in Content Manager has a Record Type.
- Classification - click the KwikSelect to search for and select, or select from the drop-down recents list, the Classification to be attached to the records created using this Origin.
- Create containers when importing documents - select this option to enable Content Manager to automatically create containers for the imported records. When selected, the Configure Settings option is enabled, to set up the container creation rules. Click Configure Settings to display the Create Container Settings dialogue. See Container Creation Settings for details on how to configure the creation of containers when adding documents using this Origin.
- Or use this container - click the KwikSelect to search for a container, or use the drop-down arrow to select a recent container, that the imported records will be contained within when created.
- Home - the record Home Location
- Owner - the record Owner Location
- Creator - the record Creator Location
- Author - the record Author Location
- Retention Schedule - click the KwikSelect to search for and select, or select from the drop-down recents list, the Retention Schedule to be attached to the records created using this Origin.
- Location Match Type - an indication of how Locations should be handled by the application that employs the BulkDataLoader SDK class. A suggestion for how the SDK application should handle each setting follows:
- Matches must be unique- the SDK application could use a unique property like email address to match the Location. The SDK application should attempt to use first name, surname, date of birth etc. to match only one Location.
- Use first match - the SDK application should use the first matching Location, or create a new Location if no match is found
- Never match - always create new - the SDK application should always create a new Location without searching for an existing Location
- Must match - never create a new Location - never create a new Location - the SDK application should only use existing Locations. If no matching Location is found, it should not attempt to create one.
NOTE: The Location Match Type option is not available for origins of type Email Capture and Folder.
Available for Origins of Text and XML File types - use this tab to choose the record fields that Content Manager should use for new records when it creates them from the origin source.
The Actual items list on the right shows the current selection.
To add a field to the list, select it in the Available items list on the left and click Add.
To remove a field from the list, select it and click Remove.
Available for Origins of Text and XML File types - use this tab to choose the Location fields that Content Manager should use for new Locations when it creates them from the origin source.
The Actual items list on the right shows the fields that new records that were created from the source currently use.
To add a field to it, select it in the Available items list on the left and click Add.
To remove field from it, select it and click Remove.
Available for Email Capture Origins - this is a tool designed for the bulk capture of emails in Content Manager.
Email Capture architecture
1. Configure an IIS 6 SMTP Server
- Make sure this server is available from your mail server
- Register a domain for the server
- Register this domain in the SMTP server
- Configure a drop directory for the domain (this will cause all email sent to this domain to be written out to this folder
2. Create a new Origin in Content Manager
- Origin type Email Capture, with the path of the drop folder as set above
- Set any other defaults and container behaviour as desired
3. Install TRIMEmailManager.exe on the same server as you installed IIS 6 SMTP
- This will be installed as a windows service and will reference an XML configuration file to get the Workgroup Server, Port and Dataset ID based on values captured from the install process (Setup.exe). The remaining configuration options are set via the client interface using the Email Capture origin object.
4. Create a journaling rule to ‘journal’ all email for selected users
- From the Exchange Admin console, go to the Journal Rules section
- Create a new journal rule
- Enter any email address at the domain used in step one above(e.g. mailarchive@my.mailarchive.acme.com) in the ‘Send journal reports to’
- Give the rule a name
- Select which users, or group of users to journal
- Select which messages to journal (internal, external or all)
5. Once the above is done, the process should be:
- Users sends or receives an email
- A copy of that email is forwarded to your SMTP server by the Exchange Journaling engine
- TRIMEmailManager.exe imports the email
- Thus all mail (sent and received) for one or more users will be filed.
System Options - Email Records setting
If all email is being filed to Content Manager for a particular user than it might be that the user will still be filing email manually. If that is the case the customer can choose one of two things:
- Keep the journaled email independently of any manually filed instance of this email.
- Delete the journaled email if a user manually files the same email. See System Options Email and Chat page.
Electronic Property - Email Message ID
Any email that is filed via the TRIMEmailManager.exe using an ‘Email Capture’ origin will have a special prefix on the Message Id. This prefix is what allows Content Manager to find imported email and delete them when an instance of the same email is filed manually.
- Include journal headers - When the journaling system captures an email, it puts a wrapper around the original message, and includes a number of its own headers to help identify where the message came from, as well as how the target user was involved, i.e. did they send the email, were they a recipient (To), a Cc contact, or a Bcc contact. If this option is selected, then this wrapper information is kept and the original email will be embedded as an attachment. Enabling this option also speeds up processing, as less parsing of the email file is required.
- Create contacts from email recipients - If this option is selected, then any recipients found in the message (i.e. To, Cc, Bcc) will be added as contacts for the record in Content Manager. There should be no noticeable performance impact when this option is enabled.
- Only create contacts for existing locations - If this option is selected, then Content Manager attaches those email recipients as Contacts to the record that are already Content Manager Contacts. It does not create new Contacts from email addresses that are not Content Manager Contacts.
- Check for duplicate email messages - When importing mail messages from the source folder, all Message-IDs are stored (approximately the last 10,000) and any email with the same Message-ID will be regarded as a duplicate. If this option is enabled, then the file will be discarded as it is likely to be the same email. This can arise if there are 2 users in the recipient list that are both being journaled.
- Use bulk data loader - This option is a performance setting. If it is not selected, then each record is created one at a time, and this incurs many calls to the database. This can be useful for testing purposes, but will be very slow in production. Once the system has been verified to perform appropriately, this option should be enabled. Any problems with bulk loading will usually come down to an incorrect or missing folder in Content Manager Enterprise Studio for the dataset being targeted, or permissions on the folder.
- Quarantine folder - Quarantined eml files are moved to this folder. If any error is encountered while attempting to parse a particular email file, it will be moved to the quarantine folder. If the same file is detected as already existing in the quarantine folder, then it is simple removed from the source folder.
- Logging level - 3 options provided: Off, On and Detailed. This controls the level of log output for TRIMEmailManager.exe. For normal operation, it should be set to “On”, and if there are problems during setup, then it should be set to detailed. It should not be left at the detailed level in production as this could affect performance and create overly large log files.
- Filter rules - The original emails embedded in the journaled message can have information that recognizes properties of the original email, for example that it is spam. The filtering rules allow specific headers to be matched, and if they are matched, the email file can be deleted, or if the 'Keep filtered messages' option is enabled, moved to a folder for storing these emails for further inspection.
- Check email header - Specify the email header to check.
- For matching value - Specify the matching value to check in the email header.
- Use regular expression - Enable the value as a regular expression.
- Match case - Enable to match case.
- Keep filtered messages - Enable and specify the Filtered message folder to allow you to inspect filtered messages.
Available for Folder Origins - use these settings to exclude or ignore files of particular types as a part of the Origin processes.
Select the file types that are to be ignored:
- Hidden Files
- System files
- Content Manager reference files
- Binary files
- Rendition files
To add files to be ignored, in the Ignore Files in the following list section, right-click and click Add. Navigate to the file to be added to the ignore list and click Open. Repeat for all files that should be ignored by the Origin.
To add a subfolder to be ignored, in the in the Ignore subfolders in the following list section, right-click and click Add. Navigate to the subfolder to be added to the ignore list and click OK. Repeat for all files that should be ignored by the Origin.
File(s) or subfolder(s) can be removed from the lists by selecting, or tagging multiple if required, the file/subfolder that should no longer be in the ignore list, right-click and click Delete.
The bulk loader process submits a number of records to the database in a batch transaction.
- Batch size - This option specifies how many records to include in the batch. To get good performance, the batch size should be reasonably large, anywhere from 1000 to 100,000 records. During processing, a temporary text file with all the fields required to insert the data into the database is constructed (with a .DAT extension) before being submitted to the database server. Depending on how many tables are affected, there will be one file per table created. The larger the batch size, the larger these temporary text files will become. After the batch has been submitted to the database server, they are removed. Due to the nature of bulk inserting into databases, the batch either succeeds, or it does not, and any errors that occurred cannot be refined to the actual cause. If there is a problem with submitting certain records, it may be necessary to disable bulk loading (in the Email Capture tab), or set the batch size to 1, in an attempt to isolate the offending record.
- Do content indexing - If checked, this option enables content indexing of the records once the batch has been submitted. This could slow down the overall import rate depending on how the indexing system has been configured. If this option is disabled, the content indexing can be performed at a later date using the origin history to identify records for indexing.
- Do word indexing - If this option is not enabled, then no word indexing of the records occur. If it is enabled, then word indexing is performed when the batch is processed. This can cause performance issues on the workgroup server if a large number of records are being loaded.
- Use events for word indexing - If this option is enabled, it creates an event for each origin history object (1 per run) and indexes all records generated for that particular run. If the run continues for more than a day, it will be ended so that the word indexing can occur. There are other ways that a run can finish, for example, stopping the service and restarting, or certain configuration changes on the origin can cause a new run to be initiated. Therefore, it can take up to a day for the word indexing to finish. The run will restart at 12.00 AM local time every day if this option is enabled.
- Use direct path load (Oracle) - This is an Oracle only setting. It is a performance setting for optimizing the data loading capability with Oracle and the SQLLoader tool . Refer to Oracle documentation for more information.
- Use TABLOCK bulk insert option (SQL Server) - This is a SQL Server only setting. This is a performance setting which locks an entire table during the bulk-import operation. Holding a lock for the duration of the bulk-import operation reduces lock contention on the table, and in some cases can significantly improve performance.
- Use CHECK_CONSTRAINTS bulk insert option (SQL Server) - Specifies that all constraints on the target table or view must be checked during the bulk-import operation. Without the CHECK_CONSTRAINTS option, any CHECK and FOREIGN KEY constraints are ignored and after the operation, the constraint on the table is marked as not-trusted. Enabling this option can slow the rate of loading.
- Use special bulk data loader numbering - This option if enabled, uses a special numbering sequence and is a performance optimization. The records are placed into a separate folder in the electronic store.
- Stream bulk insert data files to the workgroup server - If TRIMEmailManager.exe is not running on the same host as the database server, or does not share a network drive with the database server, then this option allows the workgroup server to transfer the bulk insert data files to a location that is accessible to the database server. A folder or network share must be configured in Content Manager Enterprise Studio for this purpose (Dataset Properties > Options > Work Path for Bulk Loading).
- Mode for transferring documents - If TRIMEmailManager.exe is located on a host that has access to the electronic store, via a network share, or via the local disk, then it can use the Copy or Move modes to transfer documents to the electronic store. The fastest method of loading documents is the Move mode, but this is only the case if the documents are on the same volume as the electronic store. If the electronic store is not on the same volume, then 'Move' is equivalent to 'Copy'. If the host does not have direct access to the electronic store, then the 'Workgroup Server Transfer' mode must be used, and this is the slowest way to load documents.
Optionally, you can use the Notes tab to write notes about this origin.
- Right-click the origin to delete.
A confirmation message appears. - Click Yes.
Content Manager deletes the origin.