Remove Irrelevant Content

To remove irrelevant content from HTML pages using the automatic clipping algorithm, add the parameter Clipped=TRUE to your task configuration. CFS decides which parts of the page to keep and which to discard.

The automatic clipping algorithm has been designed to work with many different pages, but this means that automatic clipping might not give the best results for every page. Alternatively, you can use CSS selectors to choose which parts of the page to keep and which to discard. To clip pages with CSS selectors, add Clipped=TRUE to your task configuration, and then set ClipPageUsingCssSelect to specify the parts of the page to keep and ClipPageUsingCssUnselect to specify the parts of the page to remove. These parameters accept standard CSS2 selectors.

You can also remove scripts and hidden content from the HTML page: