Understanding the Rewriting Process

Access Gateway rewrites the URL references under the following conditions:

  • To ensure that URL references contain the proper scheme (HTTP or HTTPS).

    If your web servers and Access Gateway machines are behind a secure firewall, you might not require SSL sessions between them, and only require SSL between the client browser and Access Gateway. For example, an HTML file being accessed through Access Gateway for the website example.com might have a URL reference to http://example.com/path/image1.jpg. If the reverse proxy for example.com/path is using SSL sessions between the browser and Access Gateway, the URL reference http://example.com/path/image1.jpg must be rewritten to https://example.com/path/image1.jpg. Else, when the user clicks the HTTP link, the browser must change from HTTP to HTTPS and establish a new SSL session.

  • To ensure that URL references containing private IP addresses or private DNS names are changed to the published DNS name of Access Gateway or hosts.

    For example, suppose that a company has an internal website named data.com, and wants to expose this site to Internet users through Access Gateway by using a published DNS name of example.com. Many of the HTML pages of this website have URL references that contain the private DNS name, such as http://data.com/imagel.jpg. Because Internet users are unable to resolve data.com/imagel.jpg, links using this URL reference would return DNS errors in the browser.

    The HTML rewriter can resolve this issue. The DNS name field in Access Gateway configuration is set to example.com, which users can resolve through a public DNS server to Access Gateway. The rewriter parses the web page, and any URL references matching the private DNS name or private IP address listed in the web server address field of Access Gateway configuration are rewritten to the published DNS name example.com and the port number of Access Gateway.

    Rewriting URL references addresses two issues: 1) URL references that are unreachable because of the use of private DNS names or IP addresses are now made accessible and 2) prevents the exposure of private IP addresses and DNS names that might be sensitive information.

  • To ensure that the Host header in incoming HTTP packets contains the name understood by the internal web server.

    Using the example in Figure 2-11, suppose that the internal web server expects all HTTP or HTTPS requests to have the Host field set to data.com. When users send requests using the published DNS name example.com/path, the Host field of the packets in those requests received by Access Gateway is set to example.com. Access Gateway can be configured to rewrite this public name to the private name expected by the web server by setting the Web Server Host Name option to data.com. Before Access Gateway forwards packets to the web server, the Host field is changed (rewritten) from example.com to data.com. For more information, see Configuring Web Servers of a Proxy Service.

By default, Access Gateway performs the following actions when the hyperlinks in a page include published DNS name references:

  • The rewriter tries to match the scheme, domain, and port of the hyperlink with the scheme, domain, and port in the available proxy services.

    • If the entries are matched and the exact path match is found, then no rewriting happens.

    • If no exact path match is found, then the path is appended with the Remove Path on Fill option enabled.

  • If the scheme, domain, and port of the hyperlink do not match with the scheme, domain, and port in the available proxy services, the rewriting does not happen.

If the published DNS name is used as a reference, then the hyperlink URLs are rewritten. To avoid rewriting the links, set the NAGGlobalOptions NAGDisableExternalRewrite option to on.

The rewriter searches for URLs in the following HTML contexts. They must meet the following criteria to be rewritten:

Context

Criteria

HTTP Headers

Qualified URL references occurring within certain types of HTTP response headers such as Location and Content-Location are rewritten. The Location header is used to redirect the browser to where the resource can be found. The Content-Location header is used to provide an alternate location where the resource can be found.

JavaScript

Within JavaScript, absolute references are always evaluated for rewriting. Relative references (such as index.html) are not attempted. Absolute paths (such as /docs/file.html) are evaluated if the page is read from a path-based multi-homing web server and the reference follows an HTML tag. For example, the string href='/docs/file.html' is rewritten if /docs is a multi-homing path that is configured to be removed.

HTML Tags

URL references within the following HTML tag attributes are evaluated for rewriting:

action                  archive             background
cite                    code                codebase
data                    dynscr              filterLink
href                    longdesc            lowsrc
o:WebQuerySourceHref    onclick             onmenuclick
pluginspage             src                 usemap
usermapborderimage

References

An absolute reference is a reference that has all the information needed to locate a resource, including the hostname, such as http://internal.web.site.com/index.html. The rewriter always attempts to rewrite absolute references.

The rewriter attempts to rewrite an absolute path when it is the multi-homing path of a path-based multi-homing service. For example, /docs/file1.html is rewritten if /docs is a multi-homing path that has been configured to be removed.

Relative references are not rewritten.

Query Strings

URL references contained within query strings can be configured for rewriting by enabling Rewrite Inbound Query String Data.

Post Data

URL references specified in Post Data can be configured for rewriting by enabling Rewrite Inbound Post Data.