ENTMF 

The UVALID Function

If a character string consists of valid Unicode UTF-8 or UTF-16 data, the UVALID function returns the value zero. If a character string contains invalid Unicode data, the UVALID function returns the index of the first invalid element.

The function type is integer.

General Format

Syntax for General Format for the UVALID function

Arguments

  1. Argument-1 must be of class alphabetic, alphanumeric, national, or UTF-8.

Returned Values

  1. The returned value is an integer, which differs based on argument-1:
    • If argument-1 is of class alphabetic, alphanumeric, or UTF-8, and it consists of valid UTF-8 encoded Unicode data, the returned value is zero.
    • If argument-1 is of class alphabetic, alphanumeric, or UTF-8, and it contains invalid UTF-8 encoded Unicode data, the returned value is the position of the first byte where the invalid UTF-8 data starts.
    • If argument-1 is of class national, and it consists of valid UTF-16 encoded Unicode data, the returned value is zero.
    • If argument-1 is of class national, and it contains invalid UTF-16 encoded Unicode data, the returned value is the position of the first UTF-16 encoding unit where the invalid UTF-16 data starts. This position is one plus the number of well-formed UTF-16 encoding units that precede the invalid data.
Note: The UVALID function indicates whether the character string contains well-formed Unicode UTF-8 or UTF-16 data. It does not indicate whether any or all of the Unicode code points represented by the character string are assigned to characters.

Comments

This function supports ideographic variation selectors (IVS), allowing the font software to select a different glyph from the default. (If no variation exists or supported then the font software will ignore it.) An IVS consists of Unicode characters in the range U-E0100 – U-E01EF. UTF-16 strings use surrogate pairs in the range U-DB40 + DD00 - U-DB40 + DDEF, and UTF-8 strings use the range 0xF3A08480 - 0xF3A087AF.

A Unicode character followed by an IVS is treated as one character when this function is processed.