Interspeci ﬁ c voice discrimination in dogs

Background and aims: Conspeci ﬁ c individual recognition using vocal cues has been shown in a wide range of species but there is no published evidence that dogs are able to recognize their owner based on his/her voice alone (interspeci ﬁ c individual recognition). Methods: In our test, dogs had to rely on vocal cues to ﬁ nd their hidden owner in a two-way choice task. From behind an opaque screen, both the owner and a control person uttered neutral speech (reading sentences from a receipt) before the dogs were allowed to make their choice. Correct choices were reinforced by food and by verbal praise. Results: During the six-choice trials, dogs chose their owner ’ s voice signi ﬁ cantly more often than the control person ’ s voice. There was no effect of learning throughout the trials, and dogs did not show side preference. Discussion: Thus, dogs are able to discriminate interspeci ﬁ c voices, suggesting that they are able to identify their owner based on vocal cues alone. This experimental design allows exploration of the role of individual acoustic parameters (such as fundamental frequency) in voice discrimination.


INTRODUCTION
Both social category recognition (SCR) and individual recognition (IR) play an important role in specific types of social interactions including attachment relationships (both filial and monogamous), hierarchical group organization, and cooperation (Tibbetts & Dale, 2007).However, there is controversy about the conditions necessary to convincingly demonstrate these phenomena in different species and separate the two forms of recognition based on perceptual and cognitive mechanisms.SCR describes a broader phenomenon than IR and refers to the ability of individuals to recognize that a certain individual belongs to a specific social category (such as offspring or siblings) based on specific cues.IR occurs when one organism identifies another one based on its individually distinctive characteristics (Dale et al., 2001).Based on evolutionary and ecological factors, both SCR and IR can be based either on specific morphological or behavioral features selected for facilitating specific social interactions among individuals (Thom & Hurst, 2004;Yorzinski, 2017).Such individual characteristics are often referred to as "signatures."Various dolphin species (Tursiops aduncus, Stenella plagiodon, Lagenorhynchus obliquidens, and Sousa chinensis) have been shown to possess so-called signature whistles.These specific vocalizations emitted by dolphins serve as identifier for individuals (Janik & Sayigh, 2013;Janik et al., 2006).Dolphin cows learn these signature whistles from their mothers (Fripp et al., 2005;King et al., 2013) and then modify their whistle in order to make them unique.
Signals used for communication may also be characterized by idiosyncratic cues.These could also facilitate the emergence of IR if members of the species have the required perceptual and cognitive skills to rely on these phenotypic variations for discriminating among individuals.Visual, acoustic, or olfactory (Brown & Johnston, 1983) signals may all have this potential.For example, different types of vocalizations could offer this possibility because the quality of the emitted sound depends on many morphological structures that vary from individual to individual (Fitch et al., 2002;Taylor & Reby, 2010).
In contrast to intraspecific IR that emerges naturally in many animal species across all taxa (Thom & Hurst, 2004;Tibbetts & Dale, 2007;Yorzinski, 2017), interspecific IR is scarce.A typical scenario for interspecific IR is the cohabitation of humans and domesticated animals.Although it is common knowledge that humans are able to recognize individual animals they interact with, this skill was rarely tested under laboratory conditions.In one study, humans were not able to match the barks of two stranger dogs belonging to the same breed to a sample bark, suggesting that the vocalizations of dogs may not provide enough cues for humans to distinguish among them (Molnár et al., 2006).It is more likely that humans rely predominantly on visual or multimodal cues when recognizing their particular companion animals or individuals of other species with whom they share their life.
Not much more is known about similar skills in domesticated species.For example, sheep have been recorded to choose familiar over unfamiliar humans (Knolle et al., 2017); however, this does not necessarily involve IR.Using a preferential looking paradigm, Proops and Mccomb (2012) provided evidence that horses are able to match the sound of familiar people to their body standing at a distance of 6 m.
There are only a few studies describing SCR in dogs; however in some cases, there is also reference to intraspecific IR.Hepper (1994) reported that 4-to -5.5-week-old dog puppies are able to recognize their siblings and their mother, and mothers can recognize their offspring even after 2 years of separation.Adult family dogs showed more comforting behavior toward a familiar dog over a stranger dog after hearing the whining of the familiar one (Quervel-Chaumette et al., 2016).In both cases, familiarity and similarity could explain preferential choice, and in any case SCR is more likely than IR.
With regard to interspecific recognition, Adachi et al. (2007) have shown that dogs looked longer at a screen after listening to their owner's voice if a picture of a stranger was displayed.Although this study was often cited as evidence of owner recognition, we propose it only demonstrates that dogs are able to form visual expectations based on vocal cues.There is no evidence that dogs can actually match these cues to their owner or that vocal cues alone are enough for IR.A much stricter experimental protocol would have been needed (see Proops & Mccomb, 2012) for univocal evidence.
Based on anecdotal evidence, dogs are able to recognize their owners' voice in various contexts.This study aimed to provide a methodological basis for revealing whether dogs are able to find their owner on the basis of vocal cues alone.In a two-way choice task, dogs had to discriminate their owner's voice over a control person's voice when both were hiding behind opaque screens.We hypothesized that dogs choose their owner significantly more often in a single sixtrial-long session.

Subjects
We tested 27 dogs [24 purebreds of 14 breeds and 3 mixedbreed dogs, mean age (year) ± SD = 4.83 ± 2.42, range: 1-9 years; 16 females and 11 males] living in human families.Two additional dogs were excluded after the training sessions because they failed to choose between screens during the training (they did not leave the starting point in three out of four trials).
Twenty-four dogs had female and three had male owners.

Experimental setting
The preference test took place in a small test room (5.4 m × 3 m; Fig. 1a) with two doors (G and I) at the Department of Ethology, Eötvös Loránd University, Budapest, Hungary.The training and the voice discrimination tests were staged in the adjacent larger test room (5.4 m × 6.27 m; Fig. 1b).

Experimental protocol
The experiment consisted of three phases: (a) preference test, (b) training, and (c) voice discrimination test (Table 1).
For each dog, three humans were present during the tests: the owner, an experimenter (two persons alternated this role across dogs), and a control person (five persons played this role; the sex of the owner and the control person were always matched).Experimenters and control persons were otherwise randomized and balanced across dogs.Before the preference test, the control person spent about 4-5 min for taming dog and talking to it.In the following, we will refer to him/her as "control person."

Paradigm of the training and the voice discrimination test
The goal of the training was to teach the paradigm to dogs (using only the owner's voice), during the voice discrimination test dogs had to discriminate the owner's voice from a control person's voice.The task of the dog was to find the owner behind one of the two opaque (blue) screens (height: 1.25 m, width of two wings: 2 × 1.02 m) based solely on his/ her voice (Fig. 1 -D and E).We placed a plastic wall (color: blue, height: 1.02 m, length: 3.50 m) between the screens, so that the dog had to make a clear choice already at the starting point ("F"), which was at 3.46 m from the tip of both screens.In a previous experiment, Polgár et al. (2015) had shown that dogs could not find their hidden owners from a distance of 3 m based on olfactory cues; thus, we can assume they relied only on vocal cues here.Based on this experimental setup, we cannot exclude that vocal and olfactory cues only together were enough to find the hiding owner.We note, however, that even if both olfactory and vocal cues are needed some knowledge is still required about the owner's individual voice.During the trials, the owner and the control person (during the voice discrimination test) were semi-randomly hiding behind one of the two opaque (blue) screens (the same person hid maximum two times consecutively on the same side, and side of hiding was balanced across trials).Between trials, the owner and the control person (in the voice discrimination test) were asked to wait at the starting location ("F"), while the experimenter took the leash of the dog and they left the room through door "H."In their absence, the owner (and the control person) hid behind the predetermined screen (in a crouching position) and the trial started.After around 15-20 s (which was counted by the experimenter), the experimenter led the dog in and stood with the dog sitting in front of him/her at location "F," holding its collar or leash.The hiding person(s) started to talk (discussed later in detail) for about 3 s upon hearing the signal of the experimenter ("Now!").At the end of speech, the experimenter released the dog to find its owner (if necessary, accompanied by the signal: "You can go!" or "Run!").After a successful owner choice, the dog was praised verbally, petted, and rewarded with food by the owner.The cases of unsuccessful owner choices are described later at the test phases.One trial lasted around 1-2 min depending on the dog's speed of choice, including hiding and rewarding phases.Callings and recipe sentences were used as acoustic stimuli (depending on the test phase).Food recipe sentences were similar-length sentences randomly chosen by the experimenters.Three sets of neutral stimuli were used, randomized, and balanced across dogs.Each set included different sentences spoken by the owner for training (training trials 3-4) and by the owner and the control person for the voice discrimination test trials (1-6).That is, each sentence was told only once during the test of each dog.Within each trial, both speakers told two different sentences.
Preference test.The preference test was run to find out whether dogs would choose the owner or the control person in a face-to-face situation.The aim of this test was to measure the initial preference of the dog.Although later we trained the dogs to choose their owner in the voice discrimination test, their potential preference toward the control person instead of the owner could have resulted in motivational differences during their choices.Thus, the aim of the preference test was to certify that dogs choose their owners under "typical circumstances."This test was performed in the smaller test room (Fig. 1a) and lasted about 3 min.First, the dog was allowed to explore the room for a short time (∼1-2 min); meanwhile, the experimenter explained the preference test to the owner.In this exploring phase, all three persons (experimenter, owner, and control person) were positioned around the middle of the smaller test room and had no contact with the dog.Then, the dog was positioned to location C, where the experimenter held the dog by its collar in front of her.In the meantime, the owner and the control person positioned themselves semirandomly (balanced across dogs) at locations "A" or "B," facing the dog.By this randomization, we controlled for a possible side effect or the potential effect of door "G" positioned next to location "A."Before the trial, the owner and the experimenter were instructed to try to motivate the dog to approach them using verbal cues.At the sign of the experimenter, the owner and the control person started calling the dog in parallel and continuously (dog name, then call) for ∼3 s.The owner and the control person were instructed to do their best to motivate dogs to approach them; they could use body gestures and modulate the tone of their voice, but they were not allowed to leave the place where they stood.The only criterion was to start the action by calling the dog by its name.Before this phase, the owner was also asked to tell the control person the verbal commands he/she usually uses to call the dog.These commands were used by the control person as well.By using the same verbal commands, we aimed to make this choice represent preference (owner or control person) over the familiarity of the call.We note, however, that this calling could also be influenced by the context.That is, the dogs are usually expected to follow their owner's command.This possible context dependence would not have constituted a confound in this study, since dogs always had to find their owner.After ∼3 s, the experimenter released the dog's collar.If the dog went to its owner, we rated the test with score 1; if it chose the control person, we scored it as 0. In case the dog refused to choose within ∼15 s, the test was considered invalid (happened only in the case of one dog).Choices were considered as approaching the person at least within 0.5 m.The chosen person petted the dog and praised it verbally, but did not give any food to the dog.The test consisted of a single trial because we wanted to avoid confusing dogs before the voice discrimination test by teaching them in another experimental arrangement.Door "G" was used during this phase to enter before and leave the room after the test.Door "I" was never used for crossing between labs, to avoid side biases during the training and the voice discrimination test (Fig. 1; Table 1).
Training.The training was performed 1-3 min after the preference test and consisted of four trials.The aim was to familiarize the dogs with the experimental conditions of the voice discrimination test.Before the training, the owner led the dog around the testing room including the area behind the screens; meanwhile, the experimenter explained the protocol to the owner.This part lasted around 3-4 min.
During the trials, the owner was semi-randomly hiding behind one of the two opaque (blue) screens (the owner hid maximum two times consecutively on the same side, and side of hiding was balanced across trials).Following the exploration and between trials, the owner was asked to wait at the starting location ("F"), while the experimenter took over the control of the dog and they left the room through door "H."In their absence, the owner hid behind the predetermined screen (in a crouching position) and the trial started.The experimenter led the dog in and stood with the dog sitting in front of him/her at location "F," holding its collar or leash.Again, the task of the dog was to find its owner based on his/her unique voice.The training was carried out with the owner (and never with the control person) because (a) we assumed that dogs already know the owner's voice and (b) we assumed that there is an attachment relationship between the owner and the dog (Topál et al., 1998).In trials 1 and 2, the dog's name followed by calling the dog (owners were instructed to call the dogs through usually used verbal commands) was used as an acoustic stimulus, and in trials 3 and 4 owners read loudly the sentences of a food recipe using neutral voice.
If the dog failed to choose or did not start searching, then the experimenter led it to find the owner.When the dog found the owner, it got a small food reward.These four training trials were followed by the voice discrimination test without any delay.If the dog did not move away from the experimenter at the start of the trial for more than two times, then it was not included in the voice discrimination test (N = 2).
Voice discrimination test.The protocol was the same as in the training phase, but in this case two persons (the owner and a control person) were hiding simultaneously behind the screens (semi-randomly and balanced) (Fig. 1).In order for the dog not to find its owner by the scent of the food reward, the control person also kept food with her/him (in a plastic box in his/her hands similar to that held by the owner).If the dog found the control person, it was not rewarded: the control person stood up and turned his/her back to the dog and the dog was refused to keep in touch with the owner (The experimenter took the leash of the dog and did not let it to the owner.).This phase consisted of six trials.The owner and the control person spoke conversationally: both read 2-2 recipe sentences from the recipe after one another.The last speaker and the hiding places ("D" or "F") were semirandomized and balanced across trials (The same person hid behind the same-sided screen at maximum two times in a row.).Choices of the owner were rated with score 1, choice of the control person was coded indicated by 0 (Fig. 1, Table 1).

Data analyses
As all dogs but one chose the owner in the preference test, that single dog was excluded from the analysis.Data of the voice discrimination test were analyzed using IBM SPSS Statistics 25 software (Armond, NY, USA).First, to examine whether dogs were able to identify their owner based on their voice, we calculated the rate of owner choices for each dog based on their scores achieved in six trials (this scale spread from 0 to 1; 0: 6/6 control person-choices-1: 6/6 owner-choices).Shapiro-Wilks test was used to test the normality of owner-choice rate and one-sample Wilcoxon test to test whether the median differed from chance level (0.5).Second, one-sample binomial test was used to test choices of dogs separately in each trial.Third, a binary generalized linear mixed model (GLMM) was used to discover possible disturbing effects on owner-choice (dependent variable).Dogs were added to the GLMM as subjects and also as random factors, while trials as repeated measures.Side, last speaker, sex, trial, and age were included in the model as independent variables.Age was included into the analyses to recover possible effect of cognitive and sensory impairment on dogs' performance in the test, which provides information on the reliability of the experimental setup (Table 3) as well.

RESULTS
Owner-choice rate (mean ± SD = 0.75 ± 0.21) in the whole test did not follow normal distribution (W = 0.854, df = 27, p = .001)and differed significantly from chance level (W = 325.5,p < .001).Dogs chose their owners on average in 75% of the six trials.Twenty-three of 27 dogs chose more times (>50% of the cases) their owner than the control person.In addition, only three dogs chose the control person more times (>50% of the cases).In the 1st, 3rd, 4th, and 5th trials, dogs chose the owner significantly more times.In the 2nd and the 6th trials, dogs' owner choices did not differ from chance level significantly (Fig. 2; Table 2).According to the GLMM analysis (Table 3), owner choice was not affected by the hiding sides (the owner-choice rate was 0.75 and 0.73 when the owner was hiding behind the right or the left screen, respectively) and the last speaker (the ownerchoice rate was 0.73 and 0.75 when the owner or the control person was the last speaker, respectively).Additionally, we found no learning effect, that is, the GLMM did not reveal a trial effect.Age and sex of dogs also had no effect on dogs' performance in the test.

DISCUSSION
After some experience with the experimental context, family dogs could match their owner with his/her individual voice.Only three dogs visited the control person in more trials than their owner.Age did not affect the results.This experiment shows that dogs are able to discriminate their owner from other people.Although it is likely that dogs recognize their owner, this study does not provide univocal evidence, because dogs could have relied more on the familiarity than on the individual specificity of the cues represented by the owner's voice.Thus, while this method may be extended to test dogs' ability for IR of humans, additional findings provide complementing insight about their performance in comparable situations.The lack of learning (trial) effect and the fact that dogs' performance in the first trial (when they met with the two voices for the first time within one trial) were similar to the following trials suggest that dogs already knew their owners' voice before the test.This means that dogs relied on their previous experience with their owner's voice throughout the test.Huber et al. (2013) showed that after extensive training some dogs were able to discriminate between their owner's face and a familiar person's face; however, only two dogs were able to rely on cues of the inner face for finding their owner.There is also plenty of experimental evidence that trained dogs are able to match specific human smells at the individual level (e.g., Pinc et al., 2011;Polgár et al., 2015).Specifically, trained working dogs are able to find the matching pair of a human odor sample among eight unfamiliar samples (e.g., Jezierski et al., 2014).Although this does not constitute IR in the traditional sense, these results show that dogs have the required perceptual and cognitive skills to detect minute "finger-print-like" odor differences and make an appropriate choice under experimental conditions (in match-to-sample tasks; Pinc et al., 2011).
There is some documentation that dog barking contains enough information for IR.Using different types of machine-learning approaches, both Molnár et al. (2008) and Larrañaga et al. (2014) were able to recognize individual dogs well above the level of significance.In case of the former, mean formant frequency and overall harmonicity were among the variables that were selected for successful recognition by the software.It should be noted that to achieve such high recognition rate, computers are trained with a large number of samples using very efficient learning paradigms.This means that such programs may be closer or even better than human experts, rather than being comparable to everyday people's performance.
It turns out that experience is also important in case of domestic dogs.For example, dogs having an owner of a particular sex were better in recognizing emotional cues displayed on the face of a person of the same sex than the opposite sex (Nagasawa et al., 2011).One possible explanation is that close contact between owner and dog over many years may facilitate learning about nuances of the facial changes in the case of specific emotions, and dogs were less able to generalize this knowledge to the other sex.In a similar vein, it was found that dogs' performance on matching males and females with vocalizations from the same sex also depended on experience: dogs living in larger human families and presumably having more experience with both sexes were more successful in this task (Ratcliffe et al., 2014).
Strictly speaking, IR should reflect the ability to tell apart single individuals based on one or more idiosyncratic cues.This means that, in this case of SCR, each category should have only a single member.Thus, the ability for IR should be clearly separated from recognizing other social categories, e.g., infant-juvenile-subadult-senior, male-female, dominant-subordinant, etc.Unfortunately, many behavioral   studies do not aim to separate the ability of discriminating among members of specific social categories from the skill to show differential behavior toward specific individuals based on IR.The typical error in this case is when there is no control for the degree of familiarity in the experimental design, thus discrimination between the individual in focus and the other individual can be based on the amount of previous experience.For example, mothers' specific reaction toward their own offspring in contrast to offspring of other mothers does not need to involve IR.Familiarity and/or similarity (e.g., odors controlled by shared genetic factors) can also explain mothers' choice for the more familiar offspring; thus, it more likely represents SCR than IR.Because of the familiarity differences across speakers and the fact that only one individual represented the social category of "ownership," the design of the current experiment does not allow to discriminate SCR and IR, but our results suggest that dogs are able to match their owner to his/ her individual voice.

CONCLUSION FOR FUTURE BIOLOGY
There is still need for additional efforts to show that dogs (or member of other species) are able to represent a category that consists of a single member: the individual.This ability may rely on some specific expertise, that is, perception of individually distinctive cues, realization of phenotypic invariances (e.g., bodily cues vs. hair cut in the case of dogs; Mongillo et al., 2017), and some form of cross-modal association.Experiments also have to be able to show that the subjects are able to distinguish between individuals belonging to the same social class.Our experimental design offers a first step in this direction, especially for investigating on how different acoustic variables in the human voice may affect dogs' choice for their owner.

Fig. 1 .
Fig. 1.Illustration of the experimental setup.Small (a) and large (b) testing rooms.A-B/D-E: location of the owner and the control person during the preference test/during the training and the voice discrimination test; C/F: location of the experimenter and the dog during the preference test/during the training and the voice discrimination test; G, H, I: doors

Fig. 2 .
Fig. 2. Owner-choice rate in the voice discrimination test.X-axis shows owner-choice rate, Y-axis: black indicates mean owner-choice rate for the whole test; dark grey indicates the performance of individual dog; hell gray indicates group results over the six trials.Asterisks above the mean column show the result of the one-sample Wilcoxon test on dogs' owner-choice rate (N = 27 dogs).Asterisks above dark and hell grey columns show the results of one-sample binomial test on dogs' individual choices (N = 6 trials per dog) and on the trial-based owner-choice rates (N = 27 dogs per trial).Striped columns show dogs with male owners.Error bars represent SEM.*p < .05. **p < .005.***p < .001

Table 1 .
Experimental protocol This table indicates the number of trials, stimulus types, and persons applied in the different test phases.

Table 2 .
Owner-choice rate per trial in the voice discrimination test

Table 3 .
The effect of independent variables on the owner choices in the voice discrimination test This table indicates GLMM results.