In this project I developed an automatic system in which audio input from a user generates upper-body movements of the user on the humanoid robot Pepper. To the best of my knowledge, this system brings two novelties: it performs whole upper-body motion synthesis including head, hand and hip movements; and it is targeted to a humanoid robot. The system was developed using only single-view RGB videos and supports offline and online synthesis modes. Using audio-visual recordings of upper-body movements of 19 speakers, I extracted audio and pose features, comparing four 3D pose estimation methods. The estimated 3D joint positions were used to calculate angles between upper-body joints and the obtained angle time-series were then smoothed and constrained to the robot’s operating limits. To learn the mapping between audio features and upper-body pose, I trained the multilayer perceptron (MLP) and long short-term memory (LSTM) neural network models in a subject-independent (SI) and subject-dependent (SD) manner. The developed system was evaluated quantitatively and qualitatively using web-surveys when driven by natural as well as synthetic speech. My investigations show that the SD model variants outperform the SI variants and that the MLP model is better suited for real-time motion synthesis than the LSTM, as it performs the online synthesis approximately 5-times faster. On natural speech, the movements generated by the LSTM model were assessed as significantly more appropriate for the given audio than those generated by the MLP model. On synthetic speech, however, the survey respondents preferred the MLP model over the LSTM.
This project was part of my MEng Thesis. It resulted in a journal paper:
Jan Ondras, Oya Celiktutan, Paul Bremner, Hatice Gunes
Audio-Driven Robot Upper-Body Motion Synthesis
IEEE Transactions on Cybernetics, 2020
@article{ondras2020audio,
title={Audio-Driven Robot Upper-Body Motion Synthesis},
author={Ondras, Jan and Celiktutan, Oya and Bremner, Paul and Gunes, Hatice},
journal={IEEE Transactions on Cybernetics},
year={2020},
publisher={IEEE}
}