By default, Eduction matches characters case sensitively, which has better performance than case insensitive matching. When you require case insensitive matching, there are several ways to configure it:
configure MatchCase
.
configure individual grammars, entities, and entries with case sensitivity options, in a custom grammar file.
use case normalization.
Micro Focus recommends that you always create and use Eduction grammars that allow you to do case sensitive matching, because it has better performance. Most of the standard grammars come with entities using common and appropriate case styles. Some also have different entities for different case styles. If your data uses a consistent case, it is unlikely that you need to use case insensitive matching.
The simplest way to turn off case sensitivity is to set the MatchCase
configuration parameter to False
in the configuration file.
However, because this option applies to all matches, it can have significant performance impacts. In general, Micro Focus recommends that you use one of the other options to enable case insensitive matching.
When you create your own custom XML grammar files, you can configure individual grammars, entities, and entries individually to be case sensitive or insensitive.
When you configure case sensitivity at a lower level, it overrides the higher level settings. Additionally, if you reference the entity in another entity, it maintains its own case sensitivity setting.
Most entities in the standard grammars do not have case sensitivity set explicitly, giving you the flexibility to use case sensitivity as required in your grammars.
NOTE: If you design an entity for case insensitive matching, it is important that entries in the entity have a consistent case style to ensure that all matches are extracted correctly. You should use all lower case, all upper case, or all initial capitals, but not a mixture.
Eduction uses an optimization technique for case insensitive matching that might not extract every possible match if you do not define the entity consistently.
Case sensitive matching generally has better performance than case insensitive matching. When you require case insensitive matching, you can use case normalization to give the same performance as case-sensitive matching.
When you want to use case normalization:
Do not set case sensitivity explicitly in grammars and entities.
Set the MatchCase
configuration parameter to True
.
Create all entries in your entities in either all lower case, or all upper case.
Set CaseNormalization
to:
LOWER
if all your entities are lower case.UPPER
if all your entities are upper case.Eduction normalizes the input data accordingly before the (case sensitive) matching. This process means that both your input and grammars are all in the same case, so the matching is effectively case insensitive, with the performance benefits of case sensitive matching.
For more information about these configuration parameters, see CaseNormalization and MatchCase.
|