Data Sources

The IDOL Government Eduction Package contains a variety of different kinds of entities to describe governmental document markings. The following sections provide some information about how this information is compiled.

For all of these types of information, as much test data is acquired as possible to test the recall metric of the algorithms.

Australian Government Email Markings

Australian government email markings information comes from the Australian government Protective Security Policy Framework, documented on https://www.protectivesecurity.gov.au.

Global Legal Entity Identifiers

Legal Entity Identifier data is collected from the Global Legal Entity Identifiers Foundation (https://www.gleif.org). Landmark data has been drawn from public sources, such as Wikipedia.

Export Numbers

Various export identifiers and codes are collected from US Government sources, the Bureau of Industry and Security (https://www.bis.doc.gov/index.php/regulations/) and the International Trade Administration (https://2016.export.gov/faq/eg_main_017509.asp). Some knowledge is also drawn from Wikipedia.

US Government CUI Markings

CUI fields are collected from US government archives, https://www.archives.gov/cui/registry/). The grammar patterns are drawn from Information Security Oversight Office documents.

US Department of Defense Markings

The Department of Defense (DOD) markings are collected from the US Department of Defense Information Security Program: Marking of Information Manual, https://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodm/520001m_vol2.pdf?ver=2020-08-04-112507-683.