Retrieve Documents from a Repository

The following table lists the fetch actions that retrieve information from a repository.

Action Description Method to override
action=fetch&fetchaction=synchronize Sends ingest commands to the ingest target to bring it up to date with what is contained the repository. synchronize
action=fetch&fetchaction=synchronize&identifiers=... Forces a synchronize of the documents listed by the identifiers action parameter, whether they have changed or not. Ingest deletes are sent to the ingest target if the documents have been deleted. synchronizeIds
action=fetch&fetchaction=Collect Retrieve content and metadata of specified documents from the repository. collect
action=View Retrieve a single document from the repository. view

The synchronize action has already been demonstrated in A Complete Synchronize Action and Make an Incremental Synchronize Action.

synchronizeIds, collect, and view are all passed one or more DocInfo objects. These contain no metadata or content, but each contains the identifier of a document to retrieve. Your connector must try to set the content and metadata for these documents from the repository. For an individual DocInfo (doc), indicate success or failure to retrieve the document by performing the operations shown in the following table:

Method Success operation Failure operation
synchronizeIds doc.success(); doc.failed(message);
collect doc.success(); doc.failed(message);
view Return as normal Throw an exception from the view method.

You can throw exceptions for any fatal errors, such as network failures that cause the retrieval of all documents to fail, from any of the methods.

View and Collect Example

The collect fetch action and view action might appear to be very similar actions, and often they are implemented to share most of the implementation. However, there are some important differences:

  • Collect is an asynchronous fetch action. It should be able to handle stop requests if the action is likely to take some time. View is a synchronous action, so it should be quick to execute.
  • Collect retrieves the content (file or text) and metadata for multiple documents when provided with the document identifiers. View retrieves the content (file) for a single document when provided with the document's identifier; metadata might also be retrieved but is discarded later.
  • Collect should handle any exception that might occur from an individual document so that remaining documents are still processed. View might throw any exception caused by the attempt to retrieve the single document.

The following sample code shows how collect and view might be implemented for a basic connector, using the file system as a repository (like the connector introduced in Make an Incremental Synchronize Action):

       void collect(const CollectTask& task)
       {
          const DocInfoList& documents = task.documents();
          for (std::size_t ii = 0; ii < documents.size(); ++ii)
          {
             DocInfo docInfo = documents[ii];
             try
             {
                collectDocument(task, docInfo);
                docInfo.success();
             }
             catch (std::exception& ex)
             {
                docInfo.failed(ex.what());
             }
          }
       }

       void view(const ViewTask& task)
       {
          collectDocument(task, task.document());
       }

    private:
       void collectDocument(const ConnectorTask& task, DocInfo& docInfo)
       {
          std::string filename = docInfo.id().reference();
          if (boost::filesystem::exists(filename))
          {
             docInfo.setFile(task.docInfoBuilder().createDocFile(filename, false));
          }
          else
          {
             throw std::runtime_error("File Not Found: " + filename);
          }
       }

The view action is provided with a single document, task.document(), while the collect action is provided with multiple documents, task.documents(). Each document must be populated from the repository. Call setFile on a document to associate a file (if there is one) and update the document with any metadata.

If the information is successfully retrieved from the repository and set in the file, call the success method to indicate that the document was handled successfully. If there is a problem, call the failed method with a description of the error that can be reported to the user. If collect calls neither the success nor the failed method for a document, failure is assumed and a warning message is written to the logs.