KVSubFileInfo

This structure contains information about a subfile in a container file. It is initialized by calling fpGetSubFileInfo(). This structure is defined in kvxtract.h.

typedef struct tag_KVSubFileInfo
{
    KVStructHeader;
    char          *subFileName;
    int           subFileType;
    uint64_t      subFileSize;
    unsigned long infoFlag;
    KVCharSet     charset;
    int           isMSBLSB;
    BYTE          fileTime[8];
    int           parentIndex;
    int           childCount;
    int           *childArray;
    uint64_t      checksum;
    KVSubFileChecksumType checksumType;
}
KVSubFileInfoRec, *KVSubFileInfo;

Member Descriptions

KVStructHeader The KeyView version of the structure. See KVStructHead.
subFileName

The path, file name, or path and file name of the subfile.

If the subfile is the body text of a mail file or is an embedded OLE object, KeyView provides a default file name. See Default File Names for Extracted Subfiles.

subFileType

The subfile’s position in the container file’s hierarchy.

  • KVSubFileType_Main The subfile is at the top level of the main file. This is the default subfile type. See Discussion.
  • KVSubFileType_Attachment The subfile is an attachment in a file.
  • KVSubFileType_OLE The subfile is an embedded OLE object in a compound document.
  • KVSubFileType_Folder The subfile is a folder or the artificial root node (see Create a Root Node).
  • KVSubFileType_UncategorisedImage An embedded image that has not been categorized by the reader.
  • KVSubFileType_EmbeddedImage An embedded image.
  • KVSubFileType_EmbeddedIcon An icon used to represent an embedded file.
  • KVSubFileType_EmbeddedContent An image used to represent content for an embedded file. This could be an preview image of the actual content, or another representation such as an icon.
  • KVSubFileType_EmbeddedPreview A preview of an embedded file. This is usually an image that shows part of the embedded file.
  • KVSubFileType_XrML The subfile contains the XrML that describes the RMS protection used on an RMS-encrypted main file.

NOTE: The classification of embedded images into images, icons, content, and previews is supported only for some Microsoft Office file formats (DOC, DOCX, XLSX, PPT, PPTX).

subFileSize

The approximate size of the subfile in bytes. This information might be useful if you do not want to extract very large files. The actual size may vary, depending for example on the values you set in the KVExtractSubFileArg you pass to fpExtractSubFile.

infoFlag

A bitwise flag that provides additional information about the subfile. The following flags are available:

  • KVSubFileInfoFlag_NeedsExtraction—The subfile might contain subfiles. It must be extracted further to conclusively determine whether it contains subfiles.
  • KVSubFileInfoFlag_Secure—The subfile is secured and credentials (such as user name and password) are required to extract it. This flag applies to ZIP, RAR, and PDF files as well as files handled by the multiarcsr reader.
  • KVSubFileInfoFlag_SMIME—The subfile is S/MIME-encrypted and credentials are required to extract it. This applies to .eml and .pst files only.
  • KVSubFileInfoFlag_External—The subfile is embedded in the main file as a link and is stored externally. For example, the subfile might be an object that was embedded in a Word document by using "Link to File," or an attachment that is referenced in an MBX message. This type of file cannot be extracted. You must write code to access the subfile based on the path in the member subFileName.
  • KVSubFileInfoFlag_MailItem—When the subfile type is KVSubFileType_Attachment, this indicates that the attachment is a mail item. This flag applies to PST, MSG, and NSF files only.
charset If the subfile is not an attachment, this is the character set of the subfile. If the subfile is an attachment, the character set is KVCS_UNKNOWN.
isMSBLSB This flag indicates whether the byte order for Unicode text is Big Endian (MSBLSB) or Little Endian (LSBMSB).
fileTime

When the subfile is a mail message, this is the file’s Sent time. Otherwise, it is the last modified time. The file time is not available for the following file types:

  • EML attachments
  • OLE objects in a Microsoft Office document
  • Embedded images
parentIndex The index number of this file’s parent. For example, the index of a folder in which the subfile is stored, or the file to which the subfile is attached. If a file does not have a parent, the parentIndex is -1.
childCount The number of first-level children in the subfile.
childArray A pointer to an array of first-level children in the subfile.
checksum If checksumType is KVSubFileChecksumType_CRC32, this is the checksum of the subfile. Otherwise, it must be ignored.
checksumType If the container format stores a CRC32 checksum for the subfile, and KeyView has obtained it, checksumType is set to KVSubFileChecksumType_CRC32 and the value of the checksum stored in the checksum member variable. Otherwise, it is KVSubFileInfoChecksumType_None. Currently, checksums are obtained for file types extracted by the unzip reader.

NOTE: KeyView does not calculate the checksums for each subfile. The checksum feature is for reporting a checksum stored in the file.

Discussion

  • Embedded images (subFileType matching KVSubFileType_EmbeddedImage, KVSubFileType_EmbeddedIcon, KVSubFileType_EmbeddedContent, and KVSubFileType_EmbeddedPreview are not extracted unless you set ExtractImages=TRUE in the configuration file (or the flag KVFLT_EXTRACTIMAGES). However, text contained in these objects is present in the filter output from the container file. As a result, if you filter a document but also extract and filter its embedded images, the output from KeyView will contain duplicate content.

    If you prefer not to see the duplicate content, you can modify your application so that it ignores these sub-files based on their subFileType. Alternatively, in the Filter API, you can set the flag KVFLT_NOEMBEDDEDOBJECT using the function fpSetConfig(). This instructs KeyView to exclude information from embedded previews (subFileType matching KVSubFileType_EmbeddedPreview) in the filter output for the container file.

  • The KVSubFileType_Main type applies to the following for each file format:

    File format KVSubFileType_Main applies to...
    MSG and EML The message body.
    Zip files A file inside the archive.
    PST files An item that is not an attachment, an OLE object, or a root node.
    MBX files A message in the MBX file.
    NSF files An item that is not an attachment, an OLE object, or a root node.
    PDF files An item that is not an attachment or a root node.
  • If you set the KVSubFileInfoFlag_NeedsExtraction flag, open the subfile and extract its children. See fpOpenFile() and fpExtractSubFile().

  • The parentIndex and childArray members provide information about the subfile’s parent and children. You can use this information to recreate the file hierarchy on extraction. Because childArray retrieves only the first-level children in the subfile, you must call fpGetSubFileInfo() repeatedly until information for the leaf-node children is extracted. See Recreate a File’s Hierarchy.