Supplementary material from "Bridging the prosody GAP: Genetic Algorithm with People to efficiently sample emotional prosody"

Abstract: Existing emotional speech corpora are often either highly curated to induce certain emotions from predefined categories, or it is difficult to dissect whether the judgment made by the annotators arise from semantic or prosodic cues. To overcome this challenge, we propose a new approach called `Genetic Algorithm with People’ (GAP), which integrates human decision and production into a genetic algorithm. In our design, we allow human creators and raters to synchronously optimize the emotional prosody over each generation by introducing evolutionary pressure. We then compare GAP with state-of-the-art emotional speech corpora to examine its robustness. We demonstrate that GAP is able to efficiently sample from the high dimensional emotion speech space and capture wide variety of emotions, with no presumptions about the dimensionality of the space. Our approach has great potential to scale and extend to other languages, which will allow for creating multilingual corpora to make cross-cultural comparisons.

Contents

 

Supplementary figures

Used sentences in experiment

Chain visualization

Figure S1: Interactive visualization of the created chains. Follow the instructions on the prompt.