This section contains several sample examples for extracting different types of information from an event. The specificity of the information extracted increases with each example. Use these examples as a starting point for creating rex expressions to suit your needs. Also, use the Regex Helper tool that simplifies rex expression creation.
The following event examples illustrate how different rex expressions extract information.
The following rex example uses this event for illustration:
Capture matching events from the left of the pipeline and assign them to the field message. The entire event is assigned to the message field.
| rex “(?<message>[^$]*)”
This expression extracts the entire event (as shown above), starting at the word “CEF:0”.
Specifying the starting point as number of characters from the start of an event instead of a specific character or word
| rex “[a-zA-Z0-9:\.\s]{16}(?<message>[^$]*)”
This expression starts extracting after 16 consecutive occurrences of the characters specified for text1—alphanumeric characters, colons, periods, or spaces. Although the first 16 characters of the first event are CEF:0|ArcSight|L, the extraction does not begin at “Logger|4.5.0…” because the pipeline character is not part of the characters we are matching, but this character is part of the beginning of the event. Therefore, the first 16 consecutive occurrences are “Logger Internal.” As a result, information starting at the word Event is extracted from our example event.
Extract a specified number of characters instead of specifying an end point such as the next space or the end of the line
| rex “[a-zA-Z0-9:\.\s]{16}(?<message>[^$]{5})”
This expression only extracts the word “Event.” (See the previous sample rex expression for a detailed explanation of the reason extraction begins at the word “Event”.)
Extract everything after “CEF:0|” into the message field. Then, pipe events for which the message field is not null through another rex expression to extract the IP address contained in the matching events and assign the IP addresses to another field, msgip. Only display events where msgip is not null.
| rex “CEF:0\|(?<message>[^$]*)” | where message is not null | rex “dvc=(?<msgip>[^ ]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})” | where msgip is not null
Note: The colon (:) and equal sign (=) characters do not need to be escaped; however, pipe (|) characters must be escaped. The characters that need to be escaped for rex expressions are the same as the ones for regular expressions. Refer to a regular expressions document of your choice to obtain a complete list of such characters.
This expression extracts the device IP address from the event.
The following rex example uses this event for illustration:
Extract the first two IP addresses from an event and assign them to two different fields, IP1 and IP2.
| rex “(?<IP1>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})” | rex “\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?<IP2>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})”
This expression extracts the first and second IP addresses in the above event.
Because the two IP addresses are right after one another in this event, you can also specify the extraction of the two IP addresses in a single rex expression as follows:
| rex “(?<IP1>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?<IP2>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})”
Note: Do not enter any spaces in the expression.
Building on the previous example, add a new field called Ignore. Assign the value “Y” to this field if the two IP addresses extracted in the previous example are the same and assign the value “N” if the two IP addresses are different. Then, list the top IP1 and IP2 combinations for events for which Ignore field is “N”.
| rex (?<IP1>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})” | rex “\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?<IP2>[^$]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})” | eval Ignore=if(IP1==IP2,“Y”,“N”) | where Ignore=“N” | top IP1 IP2
Note: The eval command uses a double equal sign (==) to equate the two fields.
Information captured by a rex expression can be used for further processing in a subsequent rex expression as illustrated in the following example. The first IP address is captured by the first rex expression and the network ID (assuming the first three bytes of the IP address represent it) to which the IP address belongs is extracted from the captured IP address:
logger | rex “(?<srcip>[^ ]\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})” | rex field=srcip “(?<netid>\d{1,3}\.\d{1,3}\.\d{1,3})
The following rex example uses this event for illustration:
127.0.0.1 - name [10/Oct/2010:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
Extract all URLs from events and generate a chart of the URL counts, excluding blank URLs. (The events contain the URL string in “http://” format.)
| rex “http://(?<customURL>[^ ]*)” | where customURL is not null | chart count by customURL | sort - customURL
Note: The meta character “/” needs to be enclosed in square brackets [] to be treated literally.
The following rex example uses this event for illustration:
Extract the first word after the word “user” (one space after the word) or “user=”. The word “user” is case-insensitive in this case, and must be preceded by a space character. That is, words such as “ruser” and “suser” should not be matched.
| rex “\s[u|U][s|S][e|E][r|R][\s|=](?<CustomUser>[^ ]*)”