Using Robin Dunbar’s theories about hominin social bonding and the evolution of music, dance and religion, this article argues that a proto-ritualistic collective song and dance performance established conditions that were ideal for the evolution of fundamental elements of human cognition, such as shared intentionality, joint attention, and the ability to form a mental template. Furthermore, it is argued that this dance routine would have been an effective breeding ground for the development of a psychological regulatory system that could powerfully enforce cooperation by collectively targeting individuals who do not or cannot conform to the melody or the beat – i.e. misfits who do not achieve synchrony with the other members of the dance group. Thus a hominin society consisting of highly cooperative individuals possessing the cognitive skills of shared intentionality, joint attention, and the ability to form mental templates – i.e. a society with all the necessary skills for the development of a mimetic form of communication, followed much later by spoken language – comes into being.