| Method | Explanation of FP-included speech samples | Example |
|---|---|---|
| TrueFP | Synthesized from text with ground-truth FPs (i.e., actually used) | I explain uh a theory. |
| PredFP | Synthesized from text with predicted FPs | I uh explain a theory. |
| Model | Explanation | α | β |
|---|---|---|---|
| Baseline | Trained without regularization | 0.0 | -- |
| Proposed | Trained with regularization for probabilistically sampled FPs | 1.0 | 4.0 |
| Speaker | Utterance | Sample | Ground-truth (Natural speech) |
Baseline | Proposed |
|---|---|---|---|---|---|
| A | Sample1 | TrueFP | |||
| PredFP | -- | ||||
| Sample2 | TrueFP | ||||
| PredFP | -- | ||||
| Sample3 | TrueFP | ||||
| PredFP | -- | ||||
| B | Sample1 | TrueFP | |||
| PredFP | -- | ||||
| Sample2 | TrueFP | ||||
| PredFP | -- | ||||
| Sample3 | TrueFP | ||||
| PredFP | -- |