Alternative Space and Apostrophe Characters

The PII, PHI, PCI, and PBI grammar sets have a list of accepted characters that it matches in place of a regular space or apostrophe character. For these characters, the alternatives match wherever the regular character would, and Named Entity Recognition replaces the character with the regular form in the normalized match text.

The following characters are matched as space characters:

  • No-Break Space (U+00A0)

  • En Space (U+2002)

  • Em Space (U+2003)

  • Figure Space (U+2007)

  • Medium Mathematical Space (U+205F)

  • Ideographic Space (U+3000)

The following characters are matched as apostrophe characters:

  • Modifier Letter Turned Comma (U+02BB)

  • Modifier Letter Apostrophe (U+02BC)

  • Modifier Letter Reversed Comma (U+02BD)

  • Modifier Letter Right Half Ring (U+02BE)

  • Modifier Letter Left Half Ring (U+02BF)

  • Right Single Quotation Mark (U+2019)

  • Fullwidth Apostrophe (U+FF07)