Defining a Nonstandard Character Set

  1. Using a text editor, edit a new file in which to place the character set. Name the file with the extension, .cs
  2. Insert the following line at the beginning of the file:

    Charset "Character Set Name" 0x0

    For Character Set Name, substitute the name for the character set that is to be visible from within the Select Character Set dialog box.

  3. Insert up to 256 lines into the file, one for each character in the character set being described. On each line there must be two entries, separated by a space or a tab, with an optional third entry, also separated by a space or tab. The first entry on the line must be the 8-bit Code Point value for the entry. This corresponds to the numeric value of the character being described. The second entry is the 16-bit Unicode character for the character being defined. The third entry is an optional comment for the line, preceded by a hash (#) character.

    For example, if the character set had a Euro Sign at position 128, the line for it would be:

    128 8364 #EURO SIGN

    Or, using hexadecimal notation:

    0x80 0x20AC #EURO SIGN

    For Code Points where there is not a defined character, use the text <NOT USED> for the second entry.

    Note: A Unicode character may be used only once in a character set. If it appears twice, an error indicating a duplicate Unicode character will be generated during import. Unique Unicode characters are necessary for all defined Code Points in order to generate mapping tables that can be used to translate characters both to and from the new character set. Rather than arbitrarily assigning Unicode characters to unused Code Points, either leave them out of the character set definition or use <NOT USED> as the Unicode character entry. When Relativity encounters the <NOT USED> entry, it creates an association in the character set being defined with unused or unmatched entries in the target character set. In this manner, if an undefined character is present in the data, it will be translated consistently.
  4. Following the lines describing the character set, insert the following line:

    EndCharset

  5. Save the file and import it into the Relativity data source.

To see a list of errors that can occur when a character set is imported into a Relativity data source, see Error Messages when Importing a Character Set.