This EnScript is designed to read and decode unused pages from SQLite database files. These pages may contain deleted data.
The databases to be read can either be entries or records (artifacts) albeit the Current View - Selected option does not currently work with records. This is a known restriction.
By default, each database will be checked using the SQLite pragma quick_check command. Given that some of the issues identified by this command won't prevent a database from being opened, this check can be skipped.
Should a database fail a validation check, the script will write the output of the pragma quick_check command to the console.
The script targets two types of unused page: those listed in the SQLite free-page list and those contained in any write-ahead-log (WAL) file.
The script is capable of reading a SQLite file's free-page list both with and without the WAL file. This doesn't necessarily mean that more data will be recovered - the same set of free pages may be used by the database in both states.
When it comes to viewing the results, the examiner should be aware of the following.
Firstly, SQLite does not support the recovery of deleted data. so there is always a chance that such data may be corrupt, in which case it might cause the script and/or EnCase to crash.
Secondly, there are a number of different types of page contained within SQLite database files. Pages containing data from indices/tables are referred to as B-Tree pages. There are four-types of B-Tree page, three of which may contain record data of potential interest to the examiner.
Every B-Tree page stores data in cells. A cell carries a record as payload data. If a record is too large to fit in a cell then it will overflow into one or more overflow-pages. These are chained together using a 4-byte pointer, one that contains the number of the next page in the chain or zero if no more overflow-pages follow. The script will not parse records that have overflow data in case they reference overflow-pages that has since been re-used. There's no guaranteed way of knowing this and parsing such pages might cause the script to crash.
Records that don't overflow are self-contained entities containing a sequence of value-definitions followed by the values themselves. Where a value-definition specifies a value of null, 0 or 1 there will be no separate value. It should be noted that a table's INTEGER PRIMARY KEY value will typically be null in which case the "rowid" value in the cell header will be used instead.
Although it's relatively easy to parse the records stored in a page, neither records nor pages contain any reference to the table or index to which they belong. Because of this, identifying the source of each record will usually involve checking the number and type of its values and comparing them against the current set of table definitions.
This is not a straightforward process because the values in a record do not have to be of the same type as that specified by the associated table definition. For instance, a value in column that's been specified as TEXT may actually be a BLOB or vice versa. Not only that, but a SQLite table might feasibly contain data from a table that's since been deleted and is no longer referenced by the current SQLite schema.
Some SQLite recovery tools analyze the current table definitions to try and determine where a record originates from. This is not a feature currently offered by this script although it may be implemented in the future.
Because of the dynamic nature of records contained in SQLite databases, the primary output of the script will itself be a SQLite database file, one for each input file. This provides the flexibility needed to handle dynamic data-types gracefully; also to search the resultant data.
The examiner may also choose to write BLOB data into a logical evidence file (LEF). The name of each BLOB-file in the LEF will consist of an index number used to identify the input file, the row-ID of the record in the output database and the value-ID (see below). Note that a SQLite database viewer/editor may not display row-IDs by default. GZIP BLOB data can be decompressed automatically upon extraction.
Each record in the output database will contain a sequence of values marked value_0 through value_n where n is a zero-based counter representing the largest number of values found in records recovered from the input database. The actual number of values stored in any given record is to be found in the value_count field.
Because of the way in which SQLite databases work, it's not unusual for the script to recover multiple copies of the same record, or to recover records that exist in the current dataset. To try and ease this problem, the script provides the option to de-dupe recovered records. Note that this won't identify records that exist in the current dataset nor will it take into account deleted records that have the same content but different rowid numbers, or records that originate from different tables.
In addition to the output database written for each file, the examiner can choose to bookmark the unused pages that have been parsed. This can be useful when the examiner wishes to see exactly where recovered records have been located. That said, the examiner should bear in mind that a SQLite database can store a lot of information, and applications that read SQLite databases will usually restrict themselves to loading pages into memory only when they're needed. EnCase, on the other hand, stores all bookmarks in memory, so bookmarking the data found by the script may (a) take a long time and (b) result in excessive memory usage and/or instability. Bookmarks written by the script are not de-duped.
Finally, the examiner is advised that deleted SQLite data may also be found in areas that the script doesn't target, e.g. the unallocated space between cell-pointers and cells; free blocks and fragmented bytes between page-cells; defunct cell-pointers/cells; and unused space in other pages, e.g. overflow pages. Such data may prove valuable, but it's unlikely to be as complete as data found in unused pages. The main reason behind this is that SQLite tries not to write to unused pages so as to reduce disk I/O. Once a page becomes unused it's usually left alone until reallocated.
Please see the following link for technical information regarding the on-disk structure of SQLite databases:
The examiner is also advised to read the following article in order to understand why index records may also contain valuable data:
This script was developed for use in EnCase training. For more details, please click the following link:
This version provides options to skip validation-checking and automatically decompress GZIP BLOB data upon extraction.
Please upgrade to one of the following broswers: Internet Explorer 11 (or greater) or the latest version of Chrome or Firefox