KVSubFileInfo
This structure contains information about a subfile in a container file. It is initialized by calling fpGetSubFileInfo(). This structure is defined in kvxtract.h
.
typedef struct tag_KVSubFileInfo { KVStructHeader; char *subFileName; int subFileType; uint64_t subFileSize; unsigned long infoFlag; KVCharSet charset; int isMSBLSB; BYTE fileTime[8]; int parentIndex; int childCount; int *childArray; uint64_t checksum; KVSubFileChecksumType checksumType; } KVSubFileInfoRec, *KVSubFileInfo;
Member Descriptions
KVStructHeader
|
The KeyView version of the structure. See |
subFileName
|
The path, file name, or path and file name of the subfile. If the subfile is the body text of a mail file or is an embedded OLE object, KeyView provides a default file name. See Default File Names for Extracted Subfiles. |
subFileType
|
The subfile’s position in the container file’s hierarchy.
NOTE: The classification of embedded images into images, icons, content, and previews is supported only for some Microsoft Office file formats (DOC, DOCX, XLSX, PPT, PPTX). |
subFileSize
|
The approximate size of the subfile in bytes. This information might be useful if you do not want to extract very large files. The actual size may vary, depending for example on the values you set in the KVExtractSubFileArg you pass to |
infoFlag
|
A bitwise flag that provides additional information about the subfile. The following flags are available:
|
charset
|
If the subfile is not an attachment, this is the character set of the subfile. If the subfile is an attachment, the character set is KVCS_UNKNOWN . |
isMSBLSB
|
This flag indicates whether the byte order for Unicode text is Big Endian (MSBLSB) or Little Endian (LSBMSB). |
fileTime
|
When the subfile is a mail message, this is the file’s
|
parentIndex
|
The index number of this file’s parent. For example, the index of a folder in which the subfile is stored, or the file to which the subfile is attached. If a file does not have a parent, the parentIndex is -1 . |
childCount
|
The number of first-level children in the subfile. |
childArray
|
A pointer to an array of first-level children in the subfile. |
checksum
|
If checksumType is KVSubFileChecksumType_CRC32 , this is the checksum of the subfile. Otherwise, it must be ignored. |
checksumType
|
If the container format stores a CRC32 checksum for the subfile, and KeyView has obtained it, checksumType is set to KVSubFileChecksumType_CRC32 and the value of the checksum stored in the checksum member variable. Otherwise, it is KVSubFileInfoChecksumType_None . Currently, checksums are obtained for file types extracted by the unzip reader. |
NOTE: KeyView does not calculate the checksums for each subfile. The checksum feature is for reporting a checksum stored in the file.
Discussion
-
Embedded images (
subFileType
matchingKVSubFileType_EmbeddedImage
,KVSubFileType_EmbeddedIcon
,KVSubFileType_EmbeddedContent
, andKVSubFileType_EmbeddedPreview
are not extracted unless you setExtractImages=TRUE
in the configuration file (or the flagKVFLT_EXTRACTIMAGES
). However, text contained in these objects is present in the filter output from the container file. As a result, if you filter a document but also extract and filter its embedded images, the output from KeyView will contain duplicate content.If you prefer not to see the duplicate content, you can modify your application so that it ignores these sub-files based on their
subFileType
. Alternatively, in the Filter API, you can set the flagKVFLT_NOEMBEDDEDOBJECT
using the functionfpSetConfig()
. This instructs KeyView to exclude information from embedded previews (subFileType
matchingKVSubFileType_EmbeddedPreview
) in the filter output for the container file. -
The
KVSubFileType_Main
type applies to the following for each file format:File format KVSubFileType_Main applies to... MSG and EML The message body. Zip files A file inside the archive. PST files An item that is not an attachment, an OLE object, or a root node. MBX files A message in the MBX file. NSF files An item that is not an attachment, an OLE object, or a root node. PDF files An item that is not an attachment or a root node. -
If you set the
KVSubFileInfoFlag_NeedsExtraction
flag, open the subfile and extract its children. See fpOpenFile() and fpExtractSubFile(). -
The
parentIndex
andchildArray
members provide information about the subfile’s parent and children. You can use this information to recreate the file hierarchy on extraction. BecausechildArray
retrieves only the first-level children in the subfile, you must callfpGetSubFileInfo()
repeatedly until information for the leaf-node children is extracted. See Recreate a File’s Hierarchy.