Set up a SQL Database for Coding Information

The coding information in Fact Bank describes the entity, property, and qualifier codes in your Fact Bank system. It also defines any aliases for any of the entities, properties, and qualifiers, and maps all aliases to the same code.

You can store the coding information as a series of text files (see Create Coding Files), or in a SQL database.

You configure your Fact Bank to use a coding database by setting the CodifierType parameter in your Fact Bank system to odbc, and creating a configuration section for the coding database. For more information, see Configure the Fact Bank System.

Create a Codings SQL Database

Answer Server can connect to any RDBMS that supports SQL. The most fully tested options are:

  • SQLite
  • PostgreSQL (version 9.3 or later is required)

The SQL coding database has four required tables:

  • property_codes. Assigns a unique code to each property and qualifier in your data, as well as a canonical human-readable name and the data type.
  • properties. An inverse mapping of the property and qualifier codes, including any aliases.
  • entity_codes. Assigns a unique code to each entity in your data (that is, things for which you might want to know the values of a property).
  • entities. An inverse mapping of the entity codes, including any aliases.

Your Answer Server installation includes a PostgreSQL schema file for these tables, and a utility script that uses the psql utility to apply the schema to a database. These files are available in the /factbank/schemas/postgresql/codings directory in your installation.

The following sections describe these tables in more detail.

Property_Codes Table

The properties in your data are the values that you want to find in the Fact Bank. The property_codes table contains a unique code for each property and qualifier. This table also defines the canonical human-readable name for the property or qualifier, and sets its type.

Column Type Description
id text The ID for the property or qualifier.
code text The unique code for the property or qualifier.
canonical_name text The canonical name for the property (any aliases that you define in the properties table refer back to this name).
type text

The type of the property. You can use the following types: 

  • string. The property or qualifier values are strings.
  • time. The property or qualifier values are times in the ISO format YYYY-MM-DDTHH:NN:SS.
  • item. The property or qualifier values are entity codes. In this case, you must list the entity code in the entity_codes table, and you must list the possible values for this entity in the entities table. This option allows you to map multiple values to the same qualifier code.

Properties Table

The properties table contains the inverse mapping of the property_codes file, without the type information. You can also include aliases for a value, in a separate row.

Column Type Description
id text The ID for the property or qualifier value.
name text The name of the property or qualifier. This value can be an alias.
code_id text The ID of the row in the property_codes table that corresponds to this property or qualifier.

Entity_Codes Table

The entities are the things that you want to find the property values for. The entity_codes table assigns a unique code to each entity.

Column Type Description
id text The ID for the entity.
code text The unique code for the entity.
canonical_name text The primary name for the entity (any aliases that you define in the entities table refer back to this name).
weight integer The entity weight. If entities with similar names are returned as candidate answers, the entity with the higher weight scores more highly.

Entities Table

The entities table contains the inverse mapping of the entity_codes file. You can also include aliases for a value, in a separate row.

Column Type Description
id text The ID for the entity alias.
name text The name of the entity. This value can be an alias.
code_id text The ID of the row in the entity_codes table that corresponds to this entity.

Import Codings into a Database from Coding Files

The Answer Server installation includes a Python script utility, sql_import.py, to allow you to migrate from an existing set of coding files to a SQL codings database.

To run the script, you need:

  • an existing set of coding files
  • the details of an IDOL Content component that the script can connect to. The script uses this Content to generate stems of the canonical names that it inserts into the database. This Content must have the same stemming configuration as Fact Bank for the language that the codings apply to. Content must be running before you run the script.

To import your codings from an existing set of coding files, you run the script against the directory that contains the coding files, and provide the details of the IDOL Content component.

This script creates the required tables in your database, with the VARCHAR lengths correctly sized for your data. It also inserts the codings into the tables for the odbc fact codifier.

NOTE: The insertion is transactional. If the script is successful, it imports all the data. If there are any errors, the script does not import any data.