PreviousInstalling SCP and Samba Compatibility With Previous Versions of UNIX OptionNext

Appendix B: Regular Expressions

The UNIX Option uses regular expressions to perform the search/replace operations that are possible during both the import and publish operations.

During search/replace operations, each text file is read as a series of lines. Each line is processed by applying all of the applicable search/replace patterns to it. The order in which these patterns are applied is controlled by the order in which they were specified in the UNIX Option Setup


Note: Because each line is processed individually, it is not possible to write a pattern that can search across multiple lines.


The syntax for the regular expressions is very similar to the syntax used by the UNIX grep command. Regular expressions include both normal characters and metacharacters. Metacharacters have special meaning, or change the meaning of other regular characters. For example, if you have used the DOS prompt on a PC, you will be familiar with the dir *.* command; in this case, the asterixes (wildcards) are metacharacters which are equivalent to zero or more normal characters.

B.1 Search Patterns

The following metacharacters are supported for defining search patterns:

Metacharacter
Meaning
^ Matches the start of the line. Inside a character class, it negates the class
$ End of line
. Matches any character
[ Start of character class
] End of character class
* Matches 0 or more occurrences of the preceding regular expression
+ Matches 1 or more occurrences of the preceding regular expression
? Matches exactly 0 or 1 occurrence of the preceding regular expression
| Matches expression on either the left side or right side of it
( Start of substring
) End of substring
" Delimit character for a literal string
\ Escape character

B.2 Replace Patterns

The following metacharacters are supported for defining replace patterns:

Metacharacter
Meaning
& The string that the search pattern matched. If it os followed by a number (n) between 1 and 9, it is the string that matched substring number n.
\ Escape character

B.3 Escape Characters

The escape character is used to escape the special meaning of metacharacters.

For example, a search pattern of $HOME would fail because the $ character has a special meaning. To make this work correctly, you would specify the pattern as \$HOME; the backslash indicates that the special meaning of the character that follows it should be ignored.

In addition, the escape character is used to define some special characters that are difficult or impossible to represent otherwise. These are termed escape sequences and the following are recognized:

Escape Sequence
Meaning
\b Backspace
\e ASCII escape character
\f Form feed
\n New line
\r Carriage return
\s Space
\t Tab
\\ Backslash character
\ddd Character specified by 1 - 3 octal digits (d)
\xdd Character specified by 1 - 2 hexadecimal digits (d)
\x^c Control character specified by letter (c)

B.4 Filename Patterns

The filename patterns on the search/replace dialogs use a standard UNIX-style wildcard matching syntax instead of full regular expressions. The following metacharacters are recognized:

Metacharacter
Meaning
* Any string of 0 or more characters
? Any single character
[] Define a character class for a single character
\ Escape any of the previous special characters. Use \\ to match a backslash

B.5 Search Examples

The following examples introduce the various metacharacters.

Search Pattern
Meaning
^Start Matches the word Start if it is the first thing on the line of text.
End$ Matches the word End if it is the last thing on a line of text

Note: The UNIX Option does not pass the line termination characters e.g. CRLF or LF, to the search pattern.
file\.dat Matches the exact word file.dat anywhere on the line of text.

Note: The escape character is used before the . since the period is a metacharacter.
file.\.dat This is an example of a metacharacter. The . matches any one valid character. This pattern matches strings such as filea.dat, fileX.dat, file9.dat, and so on.
file..\.dat Metacharacters can be used multiple times. This example matches any strings that contain file, followed by exactly two characters, followed by .dat.
file..?\.dat This is an example of a repeating metacharacter. The ? character matches exactly 0 or 1 occurrences of the previous regular expression, which in this case is a . metacharacter. This example therefore matches any strings that contains file, followed by 1 or 2 other characters, followed by .dat.
file.*\.dat This example contains another repeating metacharacter. The * matches 0 or more of the preceding regular expression, which again is a . metacharacter. This example matches file, followed by any number of valid characters followed, by .dat.
file[ABC]\.dat This is an example of a character class. A character class contains a list of valid characters, in this case the letters A, B and C. This pattern matches fileA.dat, fileB.dat or fileC.dat.
file[0-9]\.dat A character class can contain a range of characters; this is specified using a hyphen. This example defines a character class that matches any number from 0 to 9. This pattern matches file, followed by a numeric digit, followed by .dat.
file[0-9A-F]+\.dat This is example is the most complex so far. The character class contains two ranges, 0 through 9, and A through F; that is, a hexadecimal digit. The + metacharacter matches 1 or more of the preceding regular expression, which is the character class. This pattern therefore matches file, followed by 1 or more hexadecimal digits, followed by .dat.
file(\.dat)? Substrings can be used to group multiple character together into one logical regular expression. In this example, the \.dat pattern is within a substring a followed by a ? metacharacter. The ? matches exactly 0 or 1 occurrences of the preceding regular expression, which, in this case is the entire substring.This pattern therefore matches file or file.dat.
Note: Without the substring, the pattern file\.dat? would match file.da or file.dat
file\.(dat)|(idx) This example contains substrings and the option metacharacter |. The option metacharacter matches either the regular expression on the left or the regular expression on the right. This pattern matches file. followed by dat or idx
"file.dat" When a search string is encased in double quotes, it ignores all other metacharacters within the quotes (except the escape character). This example matches file.dat

B.6 Replace Examples

The true power of regular expressions becomes apparent when you can replace whatever it is that you matched as part of the search. The substring operator is essential for you to be able to set the focus on whatever it is that you want to replace.

Search Pattern
Replace Pattern
Comment
"file.dat" newfile Searches for a literal string and a direct replacement with a different literal string.
(.*)\.htm &1.html Searches for any string ending in .htm and replaces it with the string that the search pattern matched, followed by .html.
\"file([0-9A-F]+)\.dat\" "newname&1.data" This search statement is an extension of one of the previous examples. It searches for a hexadecimal based filename within quotes. The quotes are escapeed and the substring delimiter around the hexadecimal digits sets the focus we want. The replacement string is newname followed by the hexadecimal digits from the search string, then the new extension. So, "file9F.dat" would become "newname9F.data"


Copyright © 1998 Micro Focus Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.
PreviousInstalling SCP and Samba Compatibility With Previous Versions of UNIX OptionNext