Importing a Standard Character Set Not Supported by Relativity

  1. FTP to the site ftp.unicode.org.
  2. Change to the directory Public/MAPPINGS .
  3. Locate the character set you want, as follows:
    • If the character set is a standard Microsoft Windows Code Page, change to the directory, VENDORS/MICSFT/WINDOWS
    • If the character set is a standard Microsoft DOS Code Page, change to the directory, VENDORS/MICSFT/PC
    • If the character set is an ISO 8859 character set, change to the directory, ISO8859.
    • If the character set is none of the above, explore the other directories in ftp.unicode.org/Public/MAPPINGS in order to locate the desired character set.
    • If you cannot find the character set you want, you must define a nonstandard character set or modify an existing Relativity character set.
  4. Locate the file for the desired character set and download the file to your local machine. Name it with the extension .cs.
  5. Edit the file with a text editor, taking care to delete any control characters, usually marked with the characters ^D, from the file.
  6. Insert the following line at the beginning of the file:

    Charset "Character Set Name" 0x0

    For Character Set Name, substitute the name for the character set that is to be visible from within the Select Character Set dialog box.

  7. At the end of the file, insert the following line:

    EndCharset

  8. Examine the remaining lines for any that are missing the second entry on the line. This second entry is the 16-bit Unicode character for the character being defined. In the definitions available on ftp.unicode.org, unused Code Points are left blank with the comment of UNDEFINED. Insert <NOT USED> as the second entry on the line.
    Note: A Unicode character may be used only once in a character set. If it appears twice, an error indicating a duplicate Unicode character will be generated during import. Unique Unicode characters are necessary for all defined Code Points in order to generate mapping tables that can be used to translate characters both to and from the new character set. Rather than arbitrarily assigning Unicode characters to unused Code Points, either leave them out of the character set definition, or use <NOT USED> as the Unicode character entry. When Relativity encounters the <NOT USED> entry, it creates an association in the character set being defined with unused or unmatched entries in the target character set. In this manner, if an undefined character is present in the data, it will be translated consistently.
  9. Save the file and import it into the Relativity data source.

To see a list of errors that can occur when a character set is imported into a Relativity data source, see Error Messages when Importing a Character Set.