The following schema describes how to segment an audio file into speaker clusters labeled Cluster_0
, Cluster_1,
and so on.
[ClusterSpeech] 0 = a <- audio(MONO, input) 1 = f1 <- frontend1(_, a) 2 = nf1 <- normalizer1(_, f1) 3 = w1 <- audiopreproc(A, a, nf1) 4 = f2 <- frontend2(_, a) 5 = nf2 <- normalizer2(_, f2, w1) 6 = w2 <- segment(_, nf2) 7 = w3 <- splitspeech(_, ws:w2, wc:w1, nf2) 8 = output <- wout(_, w3)
0 |
The audio module processes audio data. |
1
|
The frontend1 module converts audio data (a ) into speech front-end frame data. |
2
|
The normalizer1 module normalizes frame data from 1 (f1 ). |
3
|
The audiopreproc module processes the audio (a ) and normalized frame data (nf ) into Music, Speech, or Silence. |
4
|
The frontend2 module converts the audio data from 0 (a ) into speech front-end frame data. |
5
|
The normalizer2 module normalizes frame data from 4 (f2) . |
6
|
The segment module finds short homogeneous acoustic segments. |
7
|
The splitspeech modules forms the acoustic segments into speaker clusters. |
8
|
The wout module writes the audio speaker clusters to a file. |
|