Abstract: The cascade approach to Speech Translation
(ST) is based on a pipeline that concatenates
an Automatic Speech Recognition (ASR) sys-
tem followed by a Machine Translation (MT)
system. These systems are usually connected
by a segmenter that splits the ASR output into,
hopefully, semantically self-contained chunks
to be fed into the MT system. This is specially
challenging in the case of streaming ST, where
latency requirements must also be taken into
account. This work proposes novel segmenta-
tion models for streaming ST that incorporate
not only textual, but also acoustic information
to decide when the ASR output is split into
a chunk. An extensive and thorough experi-
mental setup is carried out on the Europarl-ST
dataset to prove the contribution of acoustic in-
formation to the performance of the segmenta-
tion model in terms of BLEU score in a stream-
ing ST scenario. Finally, comparative results
with previous work also show the superiority
of the segmentation models proposed in this
work.
0 Replies
Loading