Percent Encoding in Queries

Some characters, such as commas and curly braces, are used as query syntax. These characters are used so that IDOL can distinguish between the different parts of a query. For example, to search your IDOL Content engine you might use action=query with the following parameters:

Text=*&FieldText=MATCH{one,two,three}:MyField

This query would search for all documents that have a field named "MyField" that contains the value "one", "two", or "three".

Strings in a query should be percent-encoded (another name for URL-encoded). This ensures that any commas or curly braces that are part of a string are not interpreted as query syntax. For example, if you wanted to search for all documents that have a field named "MyField", that contains the value "four,five,six", you could use the following parameters:

Text=*&FieldText=MATCH{four%2cfive%2csix}:MyField

By percent-encoding the string "four,five,six", you ensure that the commas (percent-encoded as %2c) are interpreted as part of a single query string and not as separators between multiple strings.

You might need additional percent-encoding when sending requests to an IDOL server. When you send an HTTP request using the content-type application/x-www-form-urlencoded you should percent-encode all parameter values. This means that any commas, curly braces, or other special characters that are part of a string are percent-encoded twice. For example, a comma would be represented as %252c.

NOTE: The application or library that you use to send the HTTP request might be able to percent-encode the parameter values for you, and in some cases might do this automatically.

The following examples demonstrate how to send requests using cURL, a command-line tool.

To find documents that have a field named "MyField", that contains the value "one", "two", or "three", you could use the following command. This sends a request using the application/x-www-form-urlencoded content-type. The query strings, "one", "two", and "three" are percent-encoded, and the option --data-urlencode is used to percent-encode each parameter value:

curl http://localhost:9100/action=query --data-urlencode Text=* --data-urlencode FieldText=MATCH{one,two,three}:MyField

To find documents that have a field named "MyField", that contains the value "four,five,six", you could use the following command. The query string "four,five,six" is percent-encoded, and as before the option --data-urlencode is used to percent-encode each parameter value:

curl http://localhost:9100/action=query --data-urlencode Text=* --data-urlencode FieldText=MATCH{four%2cfive%2csix}:MyField

When IDOL receives this request, commas and other special characters in the query string will have been percent-encoded twice (such that a comma is represented by the sequence %252c):

Text=%2A&FieldText=MATCH%7Bfour%252cfive%252csix%7D%3AMyField

This double percent-encoding is not necessary with all content-types. For example, you could send the same queries using the multipart/form-data content-type. In both of the following commands, the query strings have been percent-encoded.

In this request, the query strings are "one", "two", and "three":

curl http://localhost:9100/action=query -F Text=* -F FieldText=MATCH{one,two,three}:MyField

In this request, there is a single query string "four,five,six":

curl http://localhost:9100/action=query -F Text=* -F FieldText=MATCH{four%2cfive%2csix}:MyField

With this content-type, IDOL Content receives the Text and FieldText parameter values in separate parts of the HTTP request body. The query strings that you supply are percent-encoded, but there is no need to percent-encode the parameter values. For example, when you send the second request, IDOL Content receives the following data:

Content-Disposition: form-data; name="Text"

*
Content-Disposition: form-data; name="FieldText"

MATCH{four%2cfive%2csix}:MyField