Abstract: Voice Activity Projection (VAP) is crucial in dialogue systems for facilitating natural turn-taking. So far, models for predicting speech activity in spoken conversations and models ...