When you run speech-to-text, HPE CFS adds a transcription of the speech to the document content (the DRECONTENT
field).
HPE CFS can also add the start time, duration, and confidence score for each detected word, sentence boundary, and period of silence to the document metadata:
AddTimingsToMetadata=TRUE
.AddConfidenceToMetadata=TRUE
.If you choose to add information to the document metadata, HPE CFS adds a metadata field named SpeechToTextWord
for each detected word, sentence boundary, or period of silence.
When you set AddTimingsToMetadata=TRUE
, the field includes attributes named start
and duration
, which describe the start time and duration in the audio:
<SpeechToTextWord start="3.1562" duration="0.3568">hello</SpeechToTextWord>
When you set AddConfidenceToMetadata=TRUE
, the field includes an attribute named confidence
, which describes the confidence score. The confidence score is a value between 0 (zero) and 1. Higher confidence scores indicate greater confidence of a correct result.
<SpeechToTextWord confidence="0.9568">hello</SpeechToTextWord>
When you set AddTimingsToMetadata=TRUE
and AddConfidenceToMetadata=TRUE
, HPE CFS adds fields that include all of these attributes:
<SpeechToTextWord start="3.1562" duration="0.3568" confidence="0.9568">hello</SpeechToTextWord>
Fields that represent periods of silence have no value, for example:
<SpeechToTextWord start="3.1562" duration="0.3568" confidence="0.9568" />
Fields that represent sentence boundaries have a value of ".
", for example:
<SpeechToTextWord start="3.1562" duration="0.3568" confidence="0.9568">.</SpeechToTextWord>
|