You can configure Eduction to match characters case sensitively or case insensitively. By default, it is case sensitive, which has better performance.
The simplest way to match case insensitively is to set the MatchCase
configuration parameter to False
in the configuration file. Alternatively, when you create your own custom XML grammar files, you can configure individual grammars, entities, and entries individually to be case sensitive or insensitive. When you configure case sensitivity at a lower level, it overrides the higher level settings. Additionally, if you reference the entity in another entity, it maintains its own case sensitivity setting.
Most entities in the standard grammars do not have case sensitivity set explicitly, giving you the flexibility to use case sensitivity as required in your grammars.
NOTE: If you design an entity for case-insensitive matching, it is important that entries in the entity have a consistent case style to ensure that all matches are extracted correctly. You should use all lower case, all upper case, or all initial capitals, but not a mixture.
Eduction uses an optimization technique for case insensitive matching that might not extract every possible match if you do not define the entity consistently.
Case sensitive matching generally has better performance than case insensitive matching. If you require case insensitive matching, you can use case normalization to give the same performance as case-sensitive matching.
When you want to use case normalization:
Do not set case sensitivity explicitly in grammars and entities.
Set the MatchCase
configuration parameter to True
.
Create all entries in your entities in either all lower case, or all upper case.
Set CaseNormalization
to:
LOWER
if all your entities are lower case.UPPER
if all your entities are upper case.Eduction normalizes the input data accordingly before the (case sensitive) matching. This process means that both your input and grammars are all in the same case, so the matching is effectively case insensitive, with the performance benefits of case sensitive matching.
For more information about these configuration parameters, see CaseNormalization and MatchCase.
Micro Focus recommends that you always create and use Eduction grammars that allow you to do case sensitive matching, because it has better performance. Most of the standard grammars come with entities using common and appropriate case styles. Some also have different entities for different case styles. If your data uses a consistent case, it is unlikely that you need to use case insensitive matching.
|