Grammar rule syntax
Grammar rule patterns are defined using standard regular expressions. Grammar categories and grammar rules in Fusion are derived from OpenText Eduction and IDOL. Fusion grammar rules support the following regular expression syntax. The following "Regular Expressions" content is from the Eduction User and Programming Guide, available from https://www.microfocus.com/documentation/idol/ as part of the IDOL 12.12 documentation.
The Eduction engine parser interprets regular expression syntax nearly identically to the UNIX regular expression syntax.
Regular expression capture groups are not returned as groups in Fusion; do not explicitly exclude them with the ?:
option when creating patterns. For example, use ([0-9]{3})?
instead of (?:[0-9]{3})?
.
Operators
The following table describes the base regular expression operators available in the Eduction engine, and the pattern the operator matches.
Operator | Matched Pattern |
---|---|
\
|
Quote the next metacharacter. |
^
|
Match the beginning of a line. |
$
|
Match the end of a line. |
.
|
Match any character (except newline). |
|
|
Alternation. |
()
|
Used for grouping to force operator precedence. |
[xy]
|
The character x or y . |
[x-z]
|
The range of characters between x and z . |
[^z]
|
Any character except NOTE: For performance reasons, OpenText recommends that you explicitly list all the characters that you want to match, rather than using this operator. NOTE: To use negated character classes in case-insensitive entities, you must include letters in both cases, for example |
Quantifiers
Operator | Matched Pattern |
---|---|
*
|
Match 0 or more times. |
+
|
Match 1 or more times. |
?
|
Match 0 or 1 times. |
{n}
|
Match exactly n times. |
{n,}
|
Match at least n times. |
{n,m}
|
Match at least n times, but no more than m times. |
Metacharacters
Operator | Matched Pattern |
---|---|
\t
|
Match tab. |
\n
|
Match newline. |
\r
|
Match return. |
\f
|
Match formfeed. |
\a
|
Match alarm (bell, beep, and so on). |
\e
|
Match escape. |
\v
|
Match vertical tab. |
\021
|
Match octal character (in this example, 21 octal). |
\xF0
|
Match hex character (in this example, F0 hex). |
\x{263a}
|
Match wide hex character (Unicode). |
\w
|
Match word character: [A-Za-z0-9_] . |
\W
|
Match non-word character: [^A-Za-z0-9_] . |
\s
|
Match whitespace character. This metacharacter also includes \n and \r : [ \t\n\r] . |
\S
|
Match non-whitespace character: [^ \t\n\r] . |
\d
|
Match digit character: [0-9] . |
\D
|
Match non-digit character: [^0-9] . |
\b
|
Match word boundary. |
\B
|
Match non-word boundary. |
\A
|
Match start of string (never match at line breaks). |
\Z
|
Match end of string. Never match at line breaks; only match at the end of the final buffer of text submitted for matching. |
\p{class}
|
Match any character that belongs to the specified Unicode character class. For example, \p{Sc} matches any currency symbol. You can omit the braces for single-character class names: \p{C} and \pC are equivalent. For a list of supported character classes, see Supported Unicode Character Classes. |
\P{class}
|
Match any character that does not belong to the specified Unicode character class. For example NOTE: For performance reasons, OpenText recommends that you avoid using negated character classes where possible. |