This function is called directly and provides a way to configure options prior to the document conversion. Currently, the function is used for the following configurations:
Generate output without images
Generate output with verbose markup and without images. To generate output with minimal markup (ID and style paragraph attributes) and without images, set the bIndexOnly
member of the KVXMLOptions
structure. See KVXMLOptions.
Enable PDF position information
Include position information in the markup generated for a PDF document.
Configure PDF bookmarks
Specify whether bookmarks in a PDF file are converted to simple XLinks in the XML output.
Configure Word bookmarks
Disable the conversion of Microsoft Word bookmarks to zone elements.
Designate temporary directory
Specify a directory in which temporary files created during XML conversion processes are stored.
Configure XML conversion
Specify the elements and attributes extracted from an XML document based on the files document type.
Enable PDF logical reading order
Convert paragraphs in PDF files in the order in which they appear on the page and with left-to-right or right-to-left paragraph direction. See Convert PDF Files to a Logical Reading Order.
Configure PDF soft hyphens
Specify whether soft hyphens are removed from the XML output. See Control Hyphenation.
Convert text and graphics that were deleted from a document with revision tracking enabled and include revision tracking information in the XML output. Convert Revision Tracking Information.
Protected file password
Specify the password to use to open a password-protected file for export.
Specify output character set for summary information
Specify the output character set for the document's metadata, when using fpGetSummaryInfo()
.
Include position and invisible text tokens (with bounding boxes) in the output
Add top, left, height, width, and rotation attributes to <p>
elements.
KVErrorCode pascal KVXMLConfig( void *pContext, int nType, int nValue, void *p );
|
A pointer returned from fpInit() or fpInitWithLicenseData(). |
|
The configuration flag. This is a symbolic constant defined in |
|
The integer value defined for the flags above. This is For For For |
|
The data for the configuration flag. This is For For For |
The following flags are available for the nType
argument in KVXMLConfig()
. These flags are defined in kvtypes.h
.
Flag |
Description |
---|---|
|
If you set |
|
If you set |
KVCFG_SETMETADATACHARSET
|
This option enables you to specify the output character set for metadata when using fpGetSummaryInfo() . nValue is a character set enumerated in KVCharSet of kvtypes.h . See Convert Character Sets. This function should be called before fpGetSummaryInfo() . |
|
If you set <a xmlns:xlink="http://www.w3.org/TR/xlink" xlink:href="#bmk1">Highlight File Format</a> <a xmlns:xlink="http://www.w3.org/TR/xlink" name="bmk1"><img src="pdf14640.jpg"/> |
|
If you set A bookmark in Microsoft Word documents is a name given to a selected area of the document. The bookmark might enclose words, paragraphs, tables, table cells, lists, list items, or the entire document. In XML Export, bookmarks are converted to zone elements ( Depending on how bookmarks are defined in the original document, the creation of zone elements might result in malformed XML. In this case, you can disable zone creation to avoid these validity errors. Zone element creation is enabled by default. |
|
The To define a directory for temporary files generated during an out-of-process conversion, set the NOTE: On Windows systems, there is a 64 K size limit to the temporary directory. When the limit is reached, you must either create a new directory or delete the contents of the existing directory; otherwise, you might receive an error message. |
|
The The settings are defined in the You can also modify element extraction settings by using the |
|
The |
|
If you set Micro Focus recommends that you remove soft hyphens if you use Export to generate text output for an indexing engine or are not concerned with maintaining the document's layout. See fpConvertStream() or KVXMLConvertFile() for more information on running Export in index mode. |
If you set this flag to To reset the flag and exclude deleted content and revision tracking information from the XML output, set the flag to |
|
Set You can also toggle comment output by modifying the |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set |
|
Set You can also toggle slide note output by modifying the |
|
This flag enables you to define a password used to open a password-protected file for export. See Export Password Protected Files.
|
|
KVCFG_POSITIONINFOOUTPUTTYPE
|
This flag enables you to extend the existing <p> tags to include bounding box information. |
The return value is one of the error codes defined in KVErrorCode
in kverrorcodes.h
.
You must call this function after the call to fpInit() or fpInitWithLicenseData() and before the call to fpConvertStream()
or KVXMLConvertFile()
.
This function runs in-process or out of process. See Convert Files Out of Process.
When converting out of process, you must call this function after the call to KVXMLStartOOPSession()
and before the call to KVXMLEndOOPSession()
. See KVXMLStartOOPSession() and KVXMLEndOOPSession().
To generate verbose markup, but no images:
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSIMAGES, TRUE, NULL);
To produce summary information in UTF8:
(*fpXMLConfig)(pKVXML, KVCFG_SETMETADATACHARSET, KVCS_UTF8, NULL);
To specify bookmarks in a PDF file are not converted to XLinks in the XML output:
(*fpXMLConfig)(pKVXML, KVCFG_SUPPRESSTOCPRINTIMAGE, TRUE, NULL);
To disable the conversion of zone elements:
(*fpXMLConfig)(pKVXML, KVCFG_DISABLEZONE, TRUE, NULL);
To set a directory for temporary files:
char tmpDir[250]; strcpy (tmpDir, "c:\\temp\\xmlexport"); (*fpXMLConfig)(pKVXML, KVCFG_SETTEMPDIRECTORY, 0, tmpDir);
To specify custom extraction settings for conversion of an XML file:
KVXConfigInfo xinfo; /* populate xinfo */ (*fpXMLConfig)(pKVXML, KVCFG_SETXMLCONFIGINFO, 0, &xinfo);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is left to right:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_LTR
, NULL);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is right to left:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_RTL
, NULL);
To specify PDF files are converted to a logical reading order, and the paragraph direction for the PDF output is determined on the fly for each page:
(*fpXMLConfig)(pKVXML, KVCFG_LOGICALPDF, LPDF_AUTO
, NULL);
To specify soft hyphens are removed from the XML output:
(*fpXMLConfig)(pKVXML, KVCFG_DELSOFTHYPHEN, TRUE
, NULL);
To convert text and graphics that are identified by revison marks:
(*fpXMLConfig)(pKVXML,KVCFG_INCLREVISIOMARK
,TRUE
, NULL);
To toggle hidden data output from Microsoft Word documents, use one of the KVCFG_WP
flags:
(*fpXMLConfig)(pKVXML, KVCFG_WP_NOCOMMENTS, TRUE, NULL);
To toggle hidden data output from Microsoft Excel documents, use one of the KVCFG_SS
flags:
(*fpXMLConfig)(pKVXML, KVCFG_SS_SHOWHIDDENINFOR, TRUE, NULL);
To toggle hidden data output from Microsoft PowerPoint documents, use one of the KVCFG_PG
flags:
(*fpXMLConfig)(pKVXML, KVCFG_PG_HIDEHIDDENSLIDE, TRUE, NULL);
To specify a password to open a password-protected file for export:
(*fpXMLConfig)(pKVXML,KVCFG_SETPASSWORD
,TRUE
,password
);
where password
is a null-terminated string of 255 or fewer characters.
To include a position element in the markup for PDF documents:
(*fpXMLConfig)(pKVXML, KVCFG_ENABLEPOSITIONINFO, TRUE, NULL);
Using the PDF position element significantly changes the generated markup. For example, without the option, the XML output from a section of a PDF document looks like this:
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE VerityXMLExport (View Source for full doctype...)> - <VerityXMLExport> - <WP> - <p id="p1" font-size="33pt"> <img src="ecpe.pdf38760.jpg" height="140px" width="292px" /> Economic Fiscal Update <font size="18pt" color="#777777">Theand</font> <font size="14pt" color="#ffffff">October 30, 2002</font> <font size="29pt" color="#a4a4a4">Overview</font> </p>
With the option enabled, the same section of the PDF document looks like this:
<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE VerityXMLExport (View Source for full doctype...)> - <VerityXMLExport> - <WP> <Position style="position:absolute;top:534px;left:254px;font-family:'Times New Roman';font-size:33pt;white-space:nowrap;" /> <Position style="position:absolute;top:393px;left:254px;white-space:nowrap;" /> <img src="ecpe.pdf36000.jpg" height="140px" width="292px" /> <Position style="position:absolute;top:308px;left:256px;font-family:'Times New Roman';font-size:33pt;white-space:nowrap;" /> Economic <Position style="position:absolute;top:346px;left:256px;font-family:'Times New Roman';font-size:33pt;white-space:nowrap;" /> Fiscal Update <Position style="position:absolute;top:298px;left:281px;font-family:'Times New Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;" /> The <Position style="position:absolute;top:336px;left:299px;font-family:'Times New Roman';font-size:18pt;color:#777777;background-color:#ffffff;white-space:nowrap;" /> and <Position style="position:absolute;top:543px;left:397px;font-family:'Times New Roman';font-size:14pt;color:#ffffff;background-color:#000000;white-space:nowrap;" /> October 30, 2004 <Position style="position:absolute;top:627px;left:382px;font-family:'Times New Roman';font-size:29pt;color:#a4a4a4;background-color:#ffffff;white-space:nowrap;" /> Overview
To include position information in attributes of <p>
tags:
(*fpXMLConfig)(pKVXML, KVCFG_ENABLEPOSITIONINFO, TRUE, NULL); (*fpXMLConfig)(pKVXML, KVCFG_POSITIONINFOOUTPUTTYPE, KVPIOT_ATTRIBUTES, NULL);
In this mode, each piece of content output by the reader with a position is put in its own <p>
element. Line break (<br/>
) tags are not included in the output.
The <p>
tags have position information, when this information is available from the reader. These are included in new attributes of the <p>
tag: top
, left
, height
, width
, and rotation
.
The top
, left
, width
, and height
attributes are all expressed in pixels. The top
and left
attributes give the coordinates of the top left corner of the content (an image, text box, and so on) relative to the top left corner of the page. The width
and height
attributes are the width and height of the content.
Rotation is expressed in degrees, and gives the clockwise rotation of the content about the top left corner. If the rotation
attribute is not present, the rotation is assumed to be zero.
NOTE: Not all readers output all these attributes for all pieces of content. Only pdf2sr
outputs width, height and rotation information for text. pdf2sr
does not put height
and width
attributes on <p>
tags that enclose images; rather, the <img>
tags themselves have the height and width. For example:
<p id="p1" font-size="12pt" top="0px" left="0px"><img src="103453.pdf00.png" height="1261px" width="892px"/></p> <p id="p2" font-family="MyriadPro-It" font-size="16pt" top="59px" left="129px" height="21px" width="447px"><i>Aufforderung zur Einreichung von Vorschlägen 2005: </i></p>
|