Choose the Content to Index

This section explains how to configure the connector so that it retrieves the content that you want to index, and nothing else.

Restrict the Content to Process

The content in SharePoint is organized in the following structure.

Web Application (on-premise only) 
  |- Site Collection
       |- Site
            |- Site
            |    |- ...
            |- Document Library
            |    |- File
            |    |    |- File version(s)
            |    |- Folder
            |         |- File
            |              |- File version(s)
            |- List 
                 |- List Item
                      |- Attachment(s)
                 |- Folder
                      |- List Item
                           |- Attachment(s)

There can be multiple site collections, and multiple sites within a site or site collection. There can be multiple lists and document libraries within a site, multiple folders and files within a document library, and so on.

NOTE: Instances of SharePoint Online have a single site collection at the root level, and no concept of Web Applications.

You can restrict the content to retrieve by setting the following configuration parameters:

 

The connector performs best if you choose the objects to process at the highest possible level. Take for example the following structure:

   http://sharepoint/                           Site Collection   
   http://sharepoint/site1/                     Site
   http://sharepoint/site1/List1                List
   http://sharepoint/site1/List1/Item1          List Item
   http://sharepoint/site1/List1/Item2          List Item
   http://sharepoint/site1/List2                List
   http://sharepoint/site1/List2/Item1          List Item
   http://sharepoint/site1/List2/Item2          List Item
   http://sharepoint/site2/                     Site
   http://sharepoint/site2/List1                List
   http://sharepoint/site2/List1/Item1          List Item
   http://sharepoint/site2/SubSite/             Site
   http://sharepoint/site2/SubSite/List1        List
   http://sharepoint/site2/SubSite/List1/Item1  List Item

You could ignore all content from site1 by configuring ListUrlCantHaveRegex=http://sharepoint/site1/.*, but the connector would have to process site1, and all of the lists on that site, just to determine that the lists should be ignored. A more efficient configuration is SiteUrlCantHaveRegex=http://sharepoint/site1/, because the connector can immediately determine that nothing from that site has to be processed.

Similarly, you could ignore content on site2, but still index content on site2/subsite, by configuring ListUrlCantHaveRegex=http://sharepoint/site2/List.*. However, the connector would have to process site2 and all of the lists on that site, just to determine that the lists should be ignored. A more efficient configuration would contain SiteUrlCantHaveRegex=http://sharepoint/site2/$, so that the connector can immediately determine that nothing from site2 has to be processed. The URL for site2/subsite does not match the regular expression http://sharepoint/site2/$, so content from site2/subsite is still processed.

Index Content that does not appear in Search Results

In SharePoint, a user can choose whether to allow items from a list or document library to appear in search results. Users can also choose whether to allow publishing pages (a type of list item) to appear in search engine results, using Search Engine Optimization (SEO) settings. In both cases, by default, the connector ignores items that do not appear. You can choose to modify this behavior:


_FT_HTML5_bannerTitle.htm