Insert Files into Hadoop
The connector's insert
fetch action inserts files into the Hadoop file system.
To use the insert
action, you must construct some XML that specifies where to add each file.
The following XML would insert a file named source_document.txt
as inserted_file.txt
:
<InsertXML> <insert> <reference>hdfs://hostname/files/inserted_file.txt</reference> <file> <type>file</type> <content>c:\documents\source_document.txt</content> </file> </insert> </InsertXML>
Set the reference
element to the path where you want to insert the file in the Hadoop file system.
Specify the information about the file to insert by using the file
element. The insert action offers several ways of specifying the source file. For example, you can provide the path and file name of the file, or provide the body of the file base-64 encoded. For more information about the insert action and the elements that you can set in the InsertXml
, refer to the Hadoop Connector Reference.
You must add the XML to the action as the value of the insertXML
action parameter. The XML must be URL-encoded before being used in the action command. For example:
http://host:port/action=Fetch&FetchAction=Insert &ConfigSection=MyTask &InsertXML=URL encoded InsertXML
Insert Files from Another Connector Into Hadoop
You can insert files retrieved by other connectors into the Hadoop file system. For example, you could use a File System Connector to retrieve files from a file system and send them to the Hadoop Connector for inserting into Hadoop.
To insert files from another connector, you must configure the other connector:
- Set the ingestion target, in the
[Ingestion]
section of the configuration file, to be the Hadoop Connector. - Run a Lua scipt on documents, before they are ingested, to modify the document references so that they are suitable for inserting into Hadoop. You can do this with the
IngestActions
configuration parameter.
For example, in the configuration file of the File System Connector:
[Ingestion] EnableIngestion=True IngesterType=Connector IngestHost=HadoopConnectorHost IngestPort=7008 IngestActions=Lua:script.lua
The following Lua script is an example that converts the document references produced by the File System Connector into a form that can be used with the Hadoop Connector's insert fetch action:
function handler( config, document, params ) local reference = document:getReference() reference = reference:gsub("\\", "/") reference = reference:gsub("//", "/") reference = reference:gsub(":", "_", 1) document:setReference("hdfs://10.1.2.33//"..reference) return true end