The following schema describes how to segment an audio file into speaker clusters labelled Cluster_0
, Cluster_1,
and so on.
[clusterSpeech] 0 = a <- wav3(MONO, input) 1 = w1 <- audiopreproc3(A, a) 2 = f <- frontend3(_, a) 3 = nf <- normalizer3(_, f, w1) 4 = w2 <- segment3(_, nf) 5 = w3 <- splitspeech3(_, ws:w2, wc:w1, nf) 6 = output <- wout3(_, w3)
0 |
The wav module processes wav audio data. |
1 |
The audiopreproc module processes the audio (a) into Music, Speech, or Silence |
2 |
The frontend module converts audio data (a) into speech front-end frame data. |
3 |
The normalizer module normalizes frame data from 2 (f) . |
4 |
The segment module finds short homogeneous acoustic segments. |
5
|
The splitspeech modules forms these into speaker clusters. |
6
|
The wout module writes the audio speaker clusters. |
|