Abstract
This study investigated the production of Mandarin and Fuzhou lexical tones by Mandarin-Fuzhou bilingual children. Forty children aged 6;11 to 7;6 and two groups of adults (Mandarin speakers and Fuzhou speakers) were asked to produce pre-selected familiar monosyllabic words. Adult judges' perceptual judgments and acoustic analysis showed that: (1) overall, these children's production performance of Mandarin tones was similar to adults', with very high accuracy; (2) children did not reach adult-like production competence in Fuzhou tones by age 7;6; and (3) there was an imbalance in children's development of the seven lexical tones in Fuzhou. Children's late and unbalanced development of Fuzhou tones could be ascribed to their unbalanced Mandarin-Fuzhou exposure, and it is argued that children might transfer the characteristics of the Mandarin tonal system to their production of Fuzhou tones.
1 Introduction
This study is a cross-sectional investigation into the production of Mandarin and Fuzhou lexical tones by Mandarin-Fuzhou simultaneous bilingual children.1 The Fuzhou dialect, also known as the Foochow dialect or Fuzhounese, is the representative dialect of the Eastern Min dialect group of Chinese, which has a complex phonological system with seven lexical tones and a large number of tone sandhi patterns. Nevertheless, to the best of our knowledge, there is no published study on children's tone acquisition in the Fuzhou dialect, and the existing studies on children's acquisition of Chinese tones have mainly focused on Mandarin (e.g., C. N. Li & Thompson 1977; Zhu & Dodd 2000; Wong, Schwartz & Jenkins 2005; Wong & Strange 2017; R. Xu et al. 2018), Cantonese (e.g., Tse 1978; So & Dodd 1995; To, Cheung & McLeod 2013; Wong, Fu & Cheung 2017; Wong & Leung 2018; Mok, Fung & Li 2019; Mok, Li & Fung 2020), and Southern Min (or Taiwanese) (e.g., Hsu 1989; Tsay 2001). These previous studies on monolingual children's lexical tone development, however, have resulted in different findings with respect to children's age of acquisition of lexical tones in a given language. In addition to the studies on monolingual children, there have been a few studies on lexical tone development of bilingual children, which have been restricted to children who acquire Cantonese and another language, e.g., Cantonese-English (Light 1977; Holm & Dodd 1999, 2006; Mok & Lee 2018), Cantonese-Mandarin (Law & So 2006; So & Leung 2006), and Urdu-Cantonese (Yao et al. 2020), and have arrived at different conclusions on whether the effect of cross-linguistic transfer can be observed in the tone production of bilingual children. This study examined the production accuracy and acoustic features of both Mandarin and Fuzhou tones by Mandarin-Fuzhou bilingual children aged 6;11 to 7;6, aiming at filling the gap of research in children's acquisition of Fuzhou tones and improving our understanding of cross-linguistic transfer in bilingual tone acquisition.
1.1 Children's production of lexical tones in Chinese
Chinese languages are well known to be tonal, which means that changes in the tone of a syllable can lead to changes in the lexical meaning. A number of studies have been conducted on monolingual children's tone production in Mandarin Chinese (e.g., C. N. Li & Thompson 1977; Zhu & Dodd 2000; Wong, Schwartz & Jenkins 2005; Wong & Strange 2017; R. Xu et al. 2018), Cantonese (e.g., Tse 1978; So & Dodd 1995; To, Cheung & McLeod 2013; Wong, Fu & Cheung 2017; Wong & Leung 2018; Mok, Fung & Li 2019; Mok, Li & Fung 2020), and Southern Min/Taiwanese (e.g., Hsu 1989; Tsay 2001), but they have revealed conflicting pictures of children's tone acquisition process in these Chinese languages.
Earlier research on lexical tone development of monolingual children has shown that tones are acquired rapidly and early in Chinese children, generally before the age of 3. For example, C. N. Li & Thompson (1977) collected longitudinal data from 17 Mandarin-speaking children in Taiwan aged 1;6 to 3;0 and showed that these children were able to produce all the four lexical tones in Mandarin accurately when they could produce sentences longer than two or three words. Zhu & Dodd's (2000) cross-sectional study of 129 Mandarin-speaking children in Beijing aged 1;6 to 4;6 reported only two tone errors in all the production data from the picture-naming and picture description tasks and demonstrated that tone errors were rare even in the youngest group of children (range 1;6–2;0). Similar findings on Mandarin monolingual children were obtained in longitudinal studies in Chao (1951), Clumeck (1980), and Zhu (2002) as well.
In addition to Mandarin, early acquisition of tones by monolingual children was also reported in earlier studies on Cantonese and Southern Min/Taiwanese. For example, Tse's (1978) longitudinal case study of a Cantonese-speaking child showed that the time span from the child's first uttered tone to the latest acquired tone covered a total period of only eight months, from 1;2 to 1;9. So & Dodd (1995) carried out a longitudinal study on four children aged 1;2 to 2;0 and reported that they acquired all the six lexical tones in Cantonese by 2;0. So & Dodd's (1995) cross-sectional picture-naming data from 268 Cantonese-speaking children (range 2;0–6;0) in Hong Kong revealed a similar picture, with only two children making tone errors in the study (one four-year-old made two and one five-year-old made three). To, Cheung & McLeod (2013) studied Cantonese tone production in 1,726 Hong Kong children aged 2;4 to 12;4 with a picture-naming task, and their results showed that children in the youngest group of participants (2;6) demonstrated 98% of correct tone production, which also supported early acquisition of Cantonese tones. For Southern Min/Taiwanese, both Hsu (1989) and Tsay (2001) investigated monolingual Taiwanese children's tone production and they reported similar findings. Hsu's (1989) 1.5-year longitudinal study on a child from the age of 1;4 to 2;10 showed that the child acquired all the lexical tones in Taiwanese very early (by 1;11), before the segments were completely acquired. Tsay's (2001) longitudinal data from 14 Taiwanese-speaking children (range 1;2–3;11) showed that children aged 2;1 to 2;3 made very few tone errors, and the production accuracy of almost all the children was higher than 90%.
More recent studies on Mandarin and Cantonese, however, suggested that monolingual children's lexical tone development was a more protracted process and children did not produce adult-like tones until after 5 or 6 years of age. A series of cross-sectional studies by Wong and her colleagues (e.g., Wong, Schwartz & Jenkins 2005; Wong 2012, 2013; Wong & Strange 2017) reported that three- to six-year-old Mandarin-speaking children in the U.S. and in Taiwan did not reach adult-like accuracy in their lexical tone production in the picture-naming task. R. Xu et al. (2018) showed that, although preschool children (range 3;5–5;11) in the study produced appropriate global tonal contours for all the four Mandarin lexical tones, their productions were not adult-like for the third tone (the low-dipping tone) and the fourth tone (the high-falling tone) and no consistent developmental changes across age could be found. Similarly protracted tone acquisition by monolingual children was reported in Cantonese. Wong, Fu & Cheung (2017) showed that children aged 3;1 to 3;11 produced Cantonese tones with low accuracy rates in a picture labeling task and did not produce any of the six lexical tones with adult-like accuracy. Using a similar method, Wong & Leung (2018) examined tone production of three groups of Cantonese-speaking children (4;0–4;10, 5;0–5;10, and 6;0–6;11) and found that 4- to 6-year-old children did not fully master the production of Cantonese tones in familiar monosyllabic words. Mok, Fung & Li (2019) collected production data from 111 Hong Kong Cantonese-speaking children spanning ages 2;1 to 6;0 with a picture-naming task and demonstrated that children's production accuracy was still not adult-like by age 6;0 although an age effect can be observed. As an expansion of Mok et al. (2019, 2020) studied 159 children (2;1–6;0) divided into eight 6-month age bands for finer analysis, and their results showed that only the production accuracy of the eldest group (5;7–6;0) could be considered close to adult-like.
As can be seen in the discussion above, there is a large discrepancy in the age of children's tone acquisition in Chinese languages in the earlier studies vs. the later studies. Such a discrepancy, as pointed out by several recent studies (e.g., Wong, Schwartz & Jenkins 2005; Wong 2012, 2013; Wong, Fu & Cheung 2017; Mok, Fung & Li 2019), can be explained by methodological differences in several aspects. First, many earlier studies examined children's tone productions in spontaneous connected speech without any control for production contexts, and imitated responses were used in some studies (e.g., So & Dodd 1995; To, Cheung & McLeod 2013), which may inflate the rating of children's tone productions. To control for the context of tone production, only tones of monosyllabic words produced in isolation were analyzed in most later studies (e.g., Wong, Schwartz & Jenkins 2005; Wong 2012, 2013; Wong, Fu & Cheung 2017; Wong & Leung 2018; Mok, Fung & Li 2019; Mok, Li & Fung 2020).
The second major difference between the earlier studies and the more recent studies lies in the number of adult native judges. In most earlier studies, children's tone accuracy was judged by only one adult judge, who was usually the author. With only one judge, it was impossible for the data to be cross-checked by other adult speakers. Although some earlier studies reported inter-rater reliability (e.g., So & Dodd 1995; Zhu & Dodd 2000; To, Cheung & McLeod 2013), only a small portion of their data were cross-checked. In contrast, later studies recruited multiple judges and statistically examined inter-rater reliability to ensure the consistency in the judges' judgments of children's tones.
In addition, some earlier studies did not define explicitly the criteria for determining tone production mastery (e.g., Chao 1951; C. N. Li & Thompson 1977; Tse 1978), and the accuracy thresholds set up in previous studies for a tone to be considered acquired or stabilized are different. For example, in Zhu & Dodd (2000), the age of stabilization of a tone was defined as the youngest age at which 90% of the children in that age group attained 66.7% accuracy of the tone. A similar criterion was adopted in Zhu (2002), in which a tone was considered stabilized if a child produced the tone with 66.7% accuracy in his/her spontaneous speech and maintained at least that level in subsequent samples. Unlike earlier studies, later studies adopted higher thresholds. For example, none of the four Mandarin tones produced by the three groups of children (3-, 4-, and 5-year-olds) was considered to be adult-like in Wong (2013), although the fourth tone produced by these three groups was perceived by the judges with 67%, 78%, and 80% accuracy, respectively. Mok, Li & Fung (2020) even argued that only the eldest group (5;7–6;0) in their study, in which two thirds of the Cantonese-speaking children had an accuracy of over 90%, could be considered close to adult-like in tone production.
Other important differences that may contribute to the large discrepancy include the recruitment of adult control groups and the use of acoustic analyses. Unlike earlier studies, many later studies analyzed and used (e.g., Wong 2012, 2013; Wong, Fu & Cheung 2017; Mok, Fung & Li 2019; Mok, Li & Fung 2020) adult speakers' productions as a reference for children's productions, and also assessed children's production accuracy through acoustic analyses instead of through adult judges' rating alone.
In a nutshell, the methods adopted in recent studies are generally more rigorous as compared to earlier studies, because they analyzed monosyllabic tones in isolation, employed multiple adult judges, set higher thresholds for tone production mastery, managed to compare children's productions to adults', and assessed tone productions with acoustic analyses. Although there is no published study on children's tone acquisition in the Fuzhou dialect so far, and nowadays it is impossible to find monolingual Fuzhou-speaking children in the city of Fuzhou, the findings and the methods in previous studies on Mandarin, Cantonese, and Southern Min/Taiwanese, especially those more recent studies, have given us some insights into the investigation of Fuzhou children's tone production, and a similarly rigorous method as in those recent studies will be adopted in the present study, which will be discussed in detail in Section 2.
1.2 Cross-linguistic transfer in bilingual tone acquisition
Both quantitative and qualitative differences between bilingual children's phonological development and that of monolingual children have been found in previous studies on bilingual phonology (e.g., De Houwer 1995; Romaine 2001). Most of the previous studies, however, have focused on bilingual children's segmental acquisition, while research on the suprasegmental aspects is lacking. Although there have been a few studies on lexical tone production of bilingual children who acquire Cantonese and a non-tonal language, e.g., Cantonese-English (Light 1977; Holm & Dodd 1999, 2006; Mok & Lee 2018) and Urdu-Cantonese (Yao et al. 2020), they have reached different conclusions on whether there is cross-linguistic influence or the effect of transfer in bilingual children's tone production.
Holm & Dodd (1999) conducted a longitudinal case study on two successive Cantonese-English bilingual children, aged 2;3–3;1 and 2;9–3;5, who were raised in almost exclusively Cantonese-speaking environments until they were 2 and 2;6 years old respectively. Although these two children's tone accuracy was monitored in their study, the errors were infrequent and thus were not discussed. Holm & Dodd's (2006) cross-sectional data from 40 successive Cantonese-English bilingual children (2;2–5;7) in Australia pointed to the same conclusion. They found that these bilingual children, who had started to speak Cantonese prior to exposure to English, were not different from their Cantonese monolingual counterparts in Hong Kong in terms of tone production accuracy, and only five of them made tonal errors.
In stark contrast to the above-mentioned studies, in which only few tonal errors by bilingual children and no effect of transfer were reported, a delay in bilingual children's lexical tone development due to cross-linguistic influence was found in Light (1977), Mok & Lee (2018), and Yao et al. (2020). Light (1977) reported a case study on a successive Cantonese-English bilingual girl, who lived in a Cantonese-dominant environment until she moved to the United States at 16 months. The girl's Cantonese tone productions were not different from those of other Cantonese-native infants by the age of 19 months, but later her Cantonese underwent “tonal disintegration” and her tonal errors abounded between the ages of 2;6 and 3;0. Most traces of this tonal disintegration were gone by the age of four, and her tonal system was again integrated. Light argued that this tonal disintegration could be ascribed to the strong influence of English phonology, because many of the girl's incorrect tone productions reflected a pitch-contour approximation of the English equivalent items. Light also argued that the girl's two languages of exposure were confused in her performance for a certain period but they became distinguished subsequently, which was why the tonal disintegration was later rectified.
Mok & Lee's (2018) findings echoed Light's (1977) observations. They examined the production of Cantonese tones by five Cantonese-English simultaneous bilingual children at 2;0 and 2;6 (4 Cantonese-dominant and 1 English-dominant; born to mixed race parents in Hong Kong) in a longitudinal corpus, and compared them with their monolingual counterparts. A delay was found in some bilingual children's tone development at 2;0, while some bilingual children were on a par with their monolingual peers. Mok & Lee (2018) also noted the use of a “high-low” template by three of the five bilingual children, which resembled the dominant trochaic pitch pattern of English disyllabic words, showing that the bilingual children's lexical tone development in Cantonese was influenced by their simultaneous exposure to English.
Cross-linguistic influence was also reported in Yao et al.'s (2020) cross-sectional study on Cantonese tone production by Urdu-Cantonese bilingual children aged 4;5 to 6;6. They compared 21 Urdu-Cantonese bilingual children, who were born and raised in Urdu-speaking households with Urdu parents in Hong Kong, with 20 age-matched Cantonese-dominant children who had minimal exposure to other languages. The results of their picture-naming experiment showed that Urdu-Cantonese bilinguals were significantly less accurate than their Cantonese-dominant counterparts in all the six lexical tones in Cantonese, which, as argued by Yao et al., could be partially attributed to the lack of correspondence with Cantonese tonal categories in Urdu.
The discussion above shows that no consensus has been reached in the literature on bilingual children's lexical tone production in Cantonese, especially with respect to whether their tone development in Cantonese is delayed and influenced by the suprasegmental properties of the other language. Such a discrepancy could be attributable to the time of exposure of the bilingual children to their second language. For example, the two successive Cantonese-English bilingual children in Holm & Dodd's (1999) study were not exposed to English until they were 2 and 2;6 years old respectively, before which they were in almost exclusively Cantonese-speaking environments. If Cantonese tones have been acquired by monolingual children by 2;0 or 2;6, as reported in earlier studies (e.g., Tse 1978; So & Dodd 1995; To, Cheung & McLeod 2013), then it can be expected that the late exposure to English will not significantly affect children's production of Cantonese tones. In contrast, the bilingual child in Light (1977) was exposed to English after 16 months, which means that her English exposure coincided with her development of Cantonese tones. Similar interactions between Cantonese and English can also be expected in simultaneous bilingual children in Mok & Lee's (2018) study, which is why cross-linguistic transfer from English to Cantonese in children's tone production was observed in both Light (1977) and Mok & Lee (2018). As for bilingual children in Yao et al.'s (2020) study, Cantonese was their second language, so it is not surprising at all that the development of their Cantonese tones was influenced by their first language, Urdu.
Besides the studies on bilingual children acquiring Cantonese and a non-tonal language, there have been a couple of studies on lexical tone production of bilingual children who acquire two tonal languages simultaneously, e.g., Cantonese-Mandarin (So & Leung 2006; Law & So 2006) and Mandarin-Southern Min (X. Li 2020). So & Leung (2006) collected spontaneous speech samples and picture-naming samples from 40 Mandarin-dominant Cantonese-Mandarin bilingual children aged from 2;6 to 5;6 in Shenzhen, and reported that only 5% of the subjects made errors in their Cantonese tone production with no further discussion. Using a similar method, Law & So (2006) investigated the production of both Cantonese and Mandarin tones by 100 children (2;6–4;11) living in Hong Kong or Shenzhen, who were Cantonese-Mandarin bilinguals and were divided categorically either as Cantonese-dominant or Mandarin-dominant according to their data on the questionnaire. Law & So (2006) found that, in terms of Cantonese tones, the Cantonese-dominant bilinguals did not produce any errors, and only two errors were made in the youngest group of Mandarin-dominant bilinguals (2;6–2;11). As for Mandarin tones, Mandarin-dominant bilinguals did not make any errors, and only nine errors were made by the youngest Cantonese-dominant bilinguals. X. Li (2020) examined the production of Xiamen lexical tones by 29 Mandarin-dominant Mandarin-Southern Min bilingual children (5;6–12;2), who were born and raised in Xiamen. Her cross-sectional data from picture-naming tasks showed that these children had very high accuracy for all the seven lexical tones in the Xiamen dialect. As these three studies all reported rare errors in bilingual children's tone production, no cross-linguistic tonal influence was found between two tonal languages.
Like Law & So (2006), So & Leung (2006), and X. Li (2020), the present study investigated the tone production by bilingual children acquiring two tonal languages. Children recruited in this study were Mandarin-Fuzhou bilinguals who were exposed to Mandarin and the Fuzhou dialect simultaneously from birth. However, it is noteworthy that despite the fact that the bilingual children we recruited were considered by their parents to have better proficiency in the Fuzhou dialect compared to their peers, our questionnaire survey results indicate an obvious imbalance in the input of Mandarin and the Fuzhou dialect—the input of the dominant language, Mandarin, far outweighs that of the less dominant language, Fuzhou, as will be further explained in Section 2.1. In contrast to children recruited from Hong Kong and Shenzhen in Law & So (2006) and So & Leung (2006), as well as children recruited from the rural areas of Xiamen in X. Li (2020), Mandarin-Fuzhou bilingual children in the urban areas of Fuzhou have had fewer opportunities to be exposed to the dialect, and the higher degree of imbalance in the dual input could be viewed as one important distinction between the bilingual children in our study and those in the aforementioned research.2 Notably, there has been a lack of formal investigation into the tone production of bilinguals with such a background, and the present study is designed to fill this crucial gap of research.
1.3 The current study
The review of literature reveals that previous studies have been restricted to lexical tone development of children acquiring Mandarin, Cantonese, and Southern Min/Taiwanese, and most studies have focused on preschool children under the age of 7 years. Despite decades of research on children's lexical tone production, two outstanding issues remain unaddressed: (i) when children reach adult-like accuracy in their lexical tone production in a particular Chinese language; and (ii) whether cross-linguistic transfer could be observed in bilingual children's tone production. In the present study, we tried to address these two issues through the examination of tone production by children aged 6;11 to 7;6 who acquire Mandarin and the Fuzhou dialect simultaneously from birth.
There are seven lexical tones in the Fuzhou dialect, including five non-checked tones (T1: [44] high-level, T2: [53] high-falling, T3: [31/33] mid-falling or mid-level, T4: [21/213] low-falling or low-dipping, and T5: [242] rising-falling), and two checked tones that only appear in syllables ending with a glottal stop [-ʔ] (T6: [23] rising-stopped, and T7: [5] high-stopped) and are much shorter in duration as compared to the non-checked tones (L. Chen & Norman 1965; Chan 1980; R. Li et al. 1994; Z. Chen 1998; Donohue 2013; You 2018, 2020; among many others).3 In contrast, there are only four lexical tones in Mandarin (T1: [55] high-level, T2: [35] mid-rising, T3: [214/21] low-dipping or low-falling, and T4: [51] high-falling). Besides the number of lexical tones, Fuzhou and Mandarin have at least three other major differences in terms of their tonal systems: (i) Fuzhou has tonal contrast between non-checked tones and checked tones, which does not exist in Mandarin; (ii) Fuzhou has a mid-level or mid-falling tone (Fuzhou T3) while Mandarin does not; (iii) Fuzhou has two complex contour tones (Fuzhou T4 and T5) while Mandarin has only one (Mandarin T3).
Given that Mandarin and Fuzhou have different tonal inventories, with distinct pitch patterns and tonal contrasts, studying Mandarin-Fuzhou bilingual children allows us to explore how these children produce the unique tonal contrasts in each language and how they manage potential interference between the two tonal systems. By comparing the tonal development of Mandarin-Fuzhou bilingual children with findings in previous studies on monolingual children and other bilingual children, we can have a better understanding about how bilingualism affects the acquisition of tonal distinctions.
Since Fuzhou has a much more complex tonal system than Mandarin, and Mandarin-Fuzhou bilingual children in this study have had unbalanced exposure to these two languages, we could expect a delay in these children's lexical tone development in the Fuzhou dialect and an effect of cross-linguistic transfer, with their Fuzhou tone productions being overall less accurate than their Mandarin tone productions. Specifically, by investigating bilingual children's production accuracy of Fuzhou and Mandarin tones with a familiar word elicitation task, we aimed at answering the following research questions in this study: (i) Would Mandarin-Fuzhou bilingual children aged 6;11 to 7;6 reach adult-like accuracy in their production of Mandarin and Fuzhou tones? (ii) Are there qualitative and quantitative differences in these children's production between Mandarin tones and Fuzhou tones? Would these children show a delay in their Fuzhou tone production as compared to their Mandarin tone production because of unbalanced exposure? If so, (iii) what are the properties of their production of Fuzhou tones and to what extent are these properties derived from the differences between the two tonal systems and from the influence of the dominant language Mandarin?
2 Methods
2.1 Participants
Forty Mandarin-Fuzhou bilingual children (20 girls, 20 boys; aged 6;11 to 7;6, average 7;3) participated in this study with written informed consent from their parents. These children were all born in Fuzhou and raised in Mandarin-Fuzhou bilingual families, with no experience of living or studying in other places for an extended period of time. Since the dialects spoken in the rural areas of Fuzhou are quite different from the Fuzhou dialect spoken in the urban areas and the accents may even vary in different regions within the city of Fuzhou, we only recruited children from the Gulou District (downtown Fuzhou). All the children were typically developing, with no reported developmental delay or speech, hearing, and/or learning disorders.
According to the data collected from the questionnaires, these children were all Mandarin-dominant bilinguals who acquired Mandarin and the Fuzhou dialect simultaneously from birth and had very limited exposure to other languages or dialects. The questionnaire survey results revealed an imbalance in their exposure to Mandarin and the Fuzhou dialect. Mandarin was always the dominant language in their life and the dominance of Mandarin even increased as they grew older. These children were attending local elementary schools, where Mandarin was used as the medium of instruction, and all just completed Grade 1 at the time of testing. All the children exclusively used Mandarin with their peers and teachers at school. In their home environment, their parents and grandparents employed both Mandarin and the Fuzhou dialect to communicate with them. Among the surveyed participants, 32 children's parents and grandparents reported a usage ratio of approximately 60% for Mandarin and 40% for the Fuzhou dialect. Five children's parents and grandparents reported a usage ratio of around 70%–30%. Only three children's parents and grandparents mentioned a usage ratio of roughly 50%–50%. Despite the unbalanced dual input, the parents of these bilingual children believed that these children's proficiency in the Fuzhou dialect exceeded that of their peers, as these children demonstrated the ability to use the dialect in describing common objects and actions in everyday life, as well as engaging in basic daily conversations with their parents and grandparents.
This study also included two adult reference groups for comparison. Ten recruited adults were Mandarin native speakers (5 females, 5 males; 7 from Beijing, 2 from Henan, and 1 from Hebei; aged 35 to 52, average 41) and the other ten were Fuzhou native speakers (5 females, 5 males; all from the Gulou District of Fuzhou; aged 46 to 66, average 59). The adult Fuzhou speakers were Fuzhou-Mandarin bilinguals and reported the Fuzhou dialect to be their dominant language.4 The adult participants did not speak languages other than Mandarin and the Fuzhou dialect, and all rated themselves as highly proficient in their native/dominant language.
2.2 Materials
Unlike previous studies, we did not collect children's production data from picture-naming tasks or from spontaneous connected speech. For one thing, children in this study had completed their Grade 1 at the time of testing, so they were able to recognize and read characters from word lists; for another, Fuzhou tones can maintain their citation tonal values only when occurring on monosyllabic syllables or on the final syllable in a given domain. When a tone is followed by another tone in a domain containing more than one syllable, the non-terminal tone usually undergoes tone sandhi (L. Chen & Norman 1965; Chan 1980; H. M. Zhang 1992, 2017; R. Li et al. 1994; Z. Chen 1998; Donohue 2013; You 2018, 2020; among others). Even when a tone occurs in the domain-final position in connected speech, its acoustic properties are still likely to be influenced by the preceding tone and/or the intonation of the sentence. Therefore, to control for the context of tone production, we elicited productions of Mandarin and Fuzhou lexical tones by using lists of pre-selected familiar monosyllabic words in this study. To make sure that Chinese characters in the word lists can all be recognized by the children, we selected the characters from Grade 1 elementary Mandarin Chinese language textbooks. 16 characters (i.e., 16 monosyllabic words) were selected for each lexical tone in the Fuzhou dialect and the Fuzhou word list thus included 112 words in total (16 words × 7 tones). The Fuzhou word list was checked by a group of Fuzhou parents before the production experiment and these Fuzhou words were judged to be familiar concepts to 6- to 7-year-old children, for example, 风 ‘wind’, 头 ‘head’, 手 ‘hand’, and 菜 “vegetables; dishes”. The Mandarin word list was composed of 116 words, including the 112 characters used in the Fuzhou word list and 4 “supplementary” characters (for example, both 吃 and 食 mean ‘to eat’ in Mandarin while 吃 is never used in the Fuzhou dialect, so only 食 was included in the Fuzhou word list while both 吃 and 食 were included in the Mandarin list). These 116 words covered all the four lexical tones in Mandarin (24 words for T1, 27 for T2, 25 for T3, and 40 for T4; the numbers of words for different Mandarin tones were not the same due to the complicated corresponding relationship in the tonal category of characters between Mandarin and Fuzhou).
2.3 Procedure
Each child attended two sessions, one for Mandarin and the other for the Fuzhou dialect, and the Mandarin word elicitation task was carried out before the elicitation task of Fuzhou words. The participants completed the tasks individually in a quiet room of their school or community center.5 Before the production experiment, parents and/or grandparents gave written informed consent and were asked to fill out a questionnaire about the language background of themselves and their children as well as the children's use of language in various contexts. Then children received instructions about the experiment procedure,6 and practice trials were given prior to the actual elicitation tasks to facilitate the children's understanding of the procedure. When the children were ready for the tasks, they were presented with the word lists, in which the target words were randomly arranged so that the children would not produce words with the same tone consecutively. The children were asked to produce each target word twice in isolation and their productions were all digitally recorded using a Zoom H2n audio recorder in 16-bit PCM format at 44.1 kHz sampling frequency. Semantic/contextual prompts were offered by the experimenter if a child was not able to produce a target word when reading the word lists. If the probing strategies failed, then the child would be asked to repeat after the experimenters and then move on to the next. This did not happen in the elicitation task of Mandarin words but happened occasionally for a couple of children when they tried to produce the Fuzhou words. These imitated responses were recorded during the experiment but were later excluded from data analysis to ensure that the accuracy of children's tone production would not be inflated.
Each adult participant attended only one session, either for Mandarin or for the Fuzhou dialect. They completed the production experiment individually in a quiet room at home or in their community center. They were presented with the word list of their native/dominant language and were asked to produce each target word twice in isolation, which were also digitally recorded. No probing strategies were used, and no failure cases were observed, because the target words were all very familiar to the adult participants.
2.4 Perceptual judgment of the tone productions
Four phonetically trained adult speakers, who did not participate in the production experiment, were recruited as judges, two for Mandarin tone productions and the other two for Fuzhou tone productions. The two Mandarin judges were native Mandarin speakers, who were born and raised in Beijing, acquired Mandarin from birth, and passed the national proficiency test of Mandarin Chinese with Level 2-A (required for Chinese-language teachers). The two Fuzhou judges were native Fuzhou speakers, who were born and raised in the Gulou District of Fuzhou, acquired the Fuzhou dialect from birth, and reported the Fuzhou dialect as their dominant and strongest language. None of the judges had intellectual, hearing, speech, or language impairment.
The perceptual judgment for each language was conducted by two judges using unfiltered speech productions, specifically following the unfiltered-two-judges condition proposed by Mok, Fung & Li (2019). The judges carried out tone rating independently, and they were asked to listen to and perceptually evaluate the participants' tone productions of the target words. Tokens that were incomplete, inaudible, or not clearly produced, together with those imitated tokens were excluded from the perceptual judgment. Among the valid tokens of each language, only those deemed correct by both judges were coded as 1 (correct) for the analysis of tonal production accuracy. If a token was perceived by at least one judge as containing an incorrect tone, it was coded as 0 (incorrect), and the judges were asked to identify the closest resembling tone. The judges' transcription of the closest resembling tone for erroneous productions were later used to examine the tonal error patterns.
Fleiss' kappas (Fleiss 1971; Fleiss, Levin & Paik 2003) were used to assess the level of agreement between the two Mandarin judges and between the two Fuzhou judges. The kappa coefficients of the two Mandarin judges regarding their judgments of the adults' and the children's Mandarin tone productions were 0.961 and 0.953, respectively. Similarly, the two Fuzhou judges achieved high kappa values for their evaluations of the adults' and children's Fuzhou tone productions, which were 0.932 and 0.895, respectively. The results indicated almost perfect agreement in the judges, according to the classification scale provided by Landis & Koch (1977).
2.5 Acoustic measurement
In addition to the obvious differences in production accuracy and error patterns, this study also found that even in the tone productions considered correct by the judges, there were some subtle discrepancies between adults and children, particularly in the production of Fuzhou lexical tones. To further investigate and compare adults' and children's tone productions, the acoustic parameters (F0 and duration) of perceptually correct tokens were examined. Specifically, the vowel portions of each token were manually segmented in Praat (Boersma & Weenink 2020) and the F0 data were extracted at 11 equidistant points in the vowel using ProsodyPro (Y. Xu 2013). Raw F0 values were then converted to logarithmic z-scores (LZ; see Rose 1987; Zhu 1999, 2005 for more discussion of F0 normalization), and the normalized pitch curves in logarithmic z-score, which were plotted against average duration, were subsequently created to examine and compare the dynamic changes of F0. The duration of correctly produced tokens was measured in Praat in addition to F0, and t-tests were conducted to compare adults' and children's mean durations of each lexical tone.
3 Results
The complete dataset contains 22,800 monosyllabic tokens (adults' Mandarin tokens: 116 words × 10 speakers × 2 tokens/word = 2,320 tokens; adults' Fuzhou tokens: 112 words × 10 speakers × 2 tokens/word = 2,240 tokens; children's Mandarin tokens: 116 words × 40 speakers × 2 tokens/word = 9,280 tokens; children's Fuzhou tokens: 112 words × 40 speakers × 2 tokens/word = 8,960 tokens). After excluding tokens that were incomplete, inaudible, or not clearly produced, as well as those produced from imitated responses, the dataset used for the perceptual judgment consists of 11,536 valid Mandarin tokens (2,313 from the 10 adult Mandarin speakers and 9,223 from the 40 bilingual children) and 10,966 valid Fuzhou tokens (2,230 from the 10 adult Fuzhou speakers and 8,736 from the 40 children).
3.1 Production of Mandarin tones
The overall perceived accuracy of Mandarin tone productions can be found in Figure 1, which shows that adults' Mandarin tone productions were perceived by the judges with ceiling accuracy for all the four lexical tones (T1: M = 100%, SD = 0%; T2: M = 99.81%, SD = 0.01%; T3: M = 99.60%, SD = 0.84%; T4: M = 99.75%, SD = 0.53%). Like Mandarin-speaking adults, Mandarin-Fuzhou bilingual children in this study reached a similarly ceiling level of performance for all the four tones (T1: M = 99.90%, SD = 0.46%; T2: M = 99.35%, SD = 1.24%; T3: M = 98.59%, SD = 2.11%; T4: M = 99.69%, SD = 0.55%), whose overall averaged production accuracy was only a little bit lower than that of adults.
Mandarin tone production accuracy of adults and children
Citation: Acta Linguistica Academica 71, 3; 10.1556/2062.2023.00733
A two-way mixed ANOVA on production accuracy was conducted to determine whether children's Mandarin tone production accuracy was statistically lower than that of adults, with participant group as the between-subject factor and the tonal category as the within-subject factor. Mauchly's test indicated that the assumption of sphericity was violated, χ2 (5) = 65.151, P < 0.001, so degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.565). The results showed that there was no significant interaction between participant group and tonal category, F (1.696, 81.385) = 1.162, P = 0.312, η2 = 0.016 (Greenhouse-Geisser correction), no significant main effect of participant group, F (1, 48) = 3.525, P = 0.067, η2 = 0.019, and no significant main effect of tonal category, F (1.696, 81.385) = 3.136, P = 0.057, η2 = 0.043 (Greenhouse-Geisser correction).
The bilingual children's adult-like performance can also be attested by the normalized pitch curves of the correctly produced Mandarin tokens. Figure 2 shows that both adults and children produced Mandarin T1 with a relatively stable and high F0, with the overall contour around the LZ score of 1. Although adults' T1 is slightly higher than children's in terms of the overall pitch, such a minor discrepancy was not sufficient for the judges to perceive adults' T1 and children's T1 as two different tones. For Mandarin T2, adults and children share a similar rising contour through the production of the tone. Both curves begin at a mid or a relatively low point, start rising at the third (20%) or the fourth (30%) normalized time point, and reach a high point at the end of the tone (LZ score of 1 or above). Adults' curve is again slightly higher than children's with respect to the general pitch height, but the difference was hardly perceived in the perceptual judgment. Similarity in overall F0 contour can also be observed in Mandarin T3. Adults' T3 and children's T3 were both produced as a dipping tone, which first exhibits F0 lowering, and then F0 ascending towards the end. The major difference between adult's and children's T3 contours lies in the position of their lowest point and the height of the ending point—adults’ T3 reached its nadir in F0 at the fifth (40%) normalized time point and ends above 0.5, while the nadir of children's T3 was at the sixth (50%) or seventh (60%) normalized time point and its ending point was lower than −0.5. T3 produced by adults in this study could be transcribed as a typical [214], while children's T3 was not as “standard” as adults' and hence could be transcribed as [212]. Nevertheless, Mandarin T3 is essentially a low-register tone and its most outstanding feature is its overall low F0 contour as compared to the other three Mandarin lexical tones (M. Chen 2000; Duanmu 2000; Lin 2007; Cao 2012; H. M. Zhang 2013; H. Zhang 2016). As can be seen in Figure 2, the F0 z-score of both adults' and children's T3 falls mostly within the range of 0 to −2, which is much lower than that of the other three tones. The discrepancy between adults' T3 and children's T3, therefore, does not lead to a significant difference in the perceived accuracy. For T4, both adults' T4 and children's T4 have a high-falling contour. The former has a significantly lower F0 z-score at the end of the tone, illustrating a much steeper negative slope as compared to the latter. Again, we can find that Mandarin-speaking adults in this study produced the tone in a more “standard” manner and hence their T4 was realized as [51], the tonal value that is traditionally used to transcribe Mandarin T4 when it stands alone or occurs in the final position of a word, phrase, or sentence. Although children's T4 in this study was phonetically a [52] tone, it could still be readily perceived as correct by native judges, because there is not another high-falling tone in Mandarin.
Normalized F0 contour (in LZ) for correctly produced Mandarin tones by adults and children
Citation: Acta Linguistica Academica 71, 3; 10.1556/2062.2023.00733
In addition to the general F0 contour, similarities were also found between adults' and children’ correct Mandarin tone productions in terms of the mean duration of each lexical tone, as shown in Table 1. Two-tailed independent samples t-tests revealed that there was no significant difference in mean duration between adults' and children's productions in this study (adults vs. children: T1: t(48) = −1.675, P = 0.100, d = 0.592; T2: t(48) = −0.766, P = 0.447, d = −0.271; T3: t(48) = −1.607, P = 0.115, d = 0.568; T4: t(48) = 0.683, P = 0.498, d = 0.241).
Comparison of adults' and children's duration of correctly produced Mandarin tones
Mean duration and SD (in ms) | ||||
T1 | T2 | T3 | T4 | |
Adults | M = 379.0, SD = 40.3 | M = 407.6, SD = 77.6 | M = 481.3, SD = 52.5 | M = 285.5, SD = 22.9 |
Children | M = 406.4, SD = 47.4 | M = 426.0, SD = 65.4 | M = 517.6, SD = 66.3 | M = 274.1, SD = 51.2 |
3.2 Production of Fuzhou tones
3.2.1 Perceived accuracy of Fuzhou tones
Figure 3 shows the overall perceived accuracy of Fuzhou tone productions by adult Fuzhou speakers and Mandarin-Fuzhou bilingual children in this study. Like their Mandarin-speaking counterparts, Fuzhou-speaking adults were at ceiling for all the lexical tones (T1: M = 99.06%, SD = 1.51%; T2: M = 99.06%, SD = 1.51%; T3: M = 98.43%, SD = 1.66%; T4: M = 97.49%, SD = 1.98%; T5: M = 97.81%, SD = 2.11%; T6: M = 97.81%, SD = 1.51%; T7: M = 97.49%, SD = 1.98%). By contrast, among the seven tones produced by children, only T1 and T2 can be considered close to adult-like in perceived production accuracy (T1: M = 95.90%, SD = 2.85%; T2: M = 91.26%, SD = 6.59%), while the other five tones produced by children were all perceived with much lower accuracy and larger individual differences among the children (T3: M = 70.66%, SD = 9.67%; T4: M = 61.40%, SD = 8.30%; T5: M = 74.38%, SD = 7.75%; T6: M = 70.39%, SD = 8.65%; T7: M = 71.20%, SD = 8.60%), showing that Mandarin-Fuzhou bilingual children as old as 7;6 did not produce Fuzhou lexical tones as well as adults.
Fuzhou tone production accuracy of adults and children
Citation: Acta Linguistica Academica 71, 3; 10.1556/2062.2023.00733
To substantiate the above observation based on averaged data, a two-way mixed ANOVA on Fuzhou tone production accuracy was performed with participant group as the between-subject factor and the tonal category as the within-subject factor. Mauchly's test showed that the assumption of sphericity was violated, χ2 (20) = 54.805, P < 0.001, so degrees of freedom were corrected using Greenhouse-Geisser estimates of sphericity (ε = 0.729). The results revealed a significant interaction between participant group and tonal category, F (4.373, 209.886) = 22.326, P < 0.001, η2 = 0.116 (Greenhouse-Geisser correction), and significant main effects for both participant group, F (1, 48) = 556.442, P < 0.001, η2 = 0.455, and tonal category, F (4.373, 209.886) = 27.209, P < 0.001, η2 = 0.141 (Greenhouse-Geisser correction). The interaction was further examined through analyses of simple main effects and pairwise comparisons with Bonferroni corrections. The results showed that there was no significant difference in production accuracy between different Fuzhou tones for adults (P = 0.173, η2 = 0.002), while there was a significant difference between tones for children (P < 0.001, η2 = 0.720). Among children's productions, significant difference was not found between T1 and T2, or between T3, T4, T5, T6, and T7, but T1 and T2 were both statistically better than the other five tones (ps < 0.001, Cohen's d = 2.396 – 4.896). Testing for differences in perceived production accuracy between the two participant groups in each tonal category revealed that, except for T1 and T2, adults produced all the Fuzhou tones significantly better than children (ps < 0.001, Cohen's d = 3.325 – 5.122).
3.2.2 Error patterns in children's Fuzhou tone productions
The error patterns in bilingual children's Fuzhou tone productions were further examined by using the judges' transcriptions of the closest resembling tones for incorrectly produced tokens. Since the children's performance in their production of Fuzhou T1 and T2 was close to adult-like, only the error patterns of the other five tones are presented in Table 2. The percentages in the cells indicate the rates of perceiving incorrect tokens of a target tone as another tone. It is noteworthy that a mid-rising tone, which was transcribed by the judges as [35] and does not exist as a lexical tone in the Fuzhou dialect, was identified in children's incorrect productions of Fuzhou tones.
Error patterns of incorrect Fuzhou tone tokens produced by children, separated by tone
Perceived as (%) | |||||||||
T1 [44] | T2 [53] | T3 [31/33] | T4 [21/213] | T5 [242] | T6 [23] | T7 [5] | [35] | ||
Target tone | T3 [31/33] | 78.15 | 8.99 | – | 5.53 | 0 | 0 | 0 | 7.33 |
T4 [21/213] | 31.65 | 29.59 | 0 | – | 4.33 | 0 | 0 | 34.43 | |
T5 [242] | 8.21 | 8.65 | 0 | 7.41 | – | 0 | 0 | 75.73 | |
T6 [23] | 6.15 | 3.45 | 0 | 4.53 | 1.02 | – | 5.69 | 79.16 | |
T7 [5] | 83.85 | 10.43 | 0 | 0 | 0 | 0.50 | – | 5.22 |
Let's consider children's Fuzhou T3 as an example. The forty children produced a total of 1,242 valid T3 tokens. Among these tokens, 359 were perceived as incorrect by one judge, and 364 by the other judge. The two judges transcribed these inaccurate tokens separately and together identified 723 closest resembling tones for these tokens, among which 565 were T1 (78.15%), 65 were T2 (8.99%), 53 were [35] (7.33%), and 40 were T4 (5.53%). Therefore, “T3 → T1” was extracted as the major error pattern for children's T3, as it accounted for more than 14.29% (1/7) of the errors identified by the two judges (chance level = 1/7, assuming that a target tone in Fuzhou is randomly mispronounced as any of the other six Fuzhou lexical tones or [35]).
As presented in Table 3, seven major error patterns were identified in children's productions of Fuzhou T3, T4, T5, T6, and T7. Specifically, children tended to produce Fuzhou T3 as T1; T4 as T1, T2, or [35]; T5 as [35]; T6 as [35]; and T7 as T1. In other words, children rarely mispronounced a target tone as T3, T5, T6, or T7 in the Fuzhou dialect. Instead, most of the error patterns in children's production involved mispronunciations of target tones as T1, T2, and the mid-rising [35].
Major error patterns of incorrect Fuzhou tone tokens produced by children
Target tone | Major error patterns |
T3 [31/33] | T3 → T1 [44] (78.15%) |
T4 [21/213] | T4 → T1 [44] (31.65%), T4 → T2 [53] (29.59%), T4 → [35] (34.43%) |
T5 [242] | T5 → [35] (75.73%) |
T6 [23] | T6 → [35] (79.16%) |
T7 [5] | T7 → T1 [44] (83.85%) |
It is interesting to note that, among the three most frequent erroneous substitute forms (i.e., T1, T2, and [35]), both Fuzhou T1 (high-level) and T2 (high-falling) have their counterparts in Mandarin (i.e., Mandarin T1 and T4, respectively), and the mid-rising [35], though not found in the Fuzhou lexical tone inventory, is exactly T2 in Mandarin. By contrast, for the four rarely occurring substitutes (i.e., Fuzhou T3, T5, T6, and T7), no corresponding tonal categories can be found in Mandarin. We will return to this in Section 4.
3.2.3 Acoustic properties of produced tones
Some of the above-mentioned findings in bilingual children's Fuzhou productions can be further attested and explained by the normalized pitch curves of the participants' Fuzhou tone productions that were labeled as correct by the judges (Figure 4) and the duration measurement (Table 4).
Normalized F0 contour (in LZ) for correctly produced Fuzhou tones by adults and children
Citation: Acta Linguistica Academica 71, 3; 10.1556/2062.2023.00733
Comparison of adults' and children's duration of correctly produced Fuzhou tones
Mean duration and SD (in ms) | |||||||
T1 | T2 | T3 | T4 | T5 | T6 | T7 | |
Adults | M = 378.0 SD = 37.6 | M = 267.4 SD = 22.3 | M = 370.7 SD = 41.5 | M = 339.9 SD = 35.9 | M = 372.3 SD = 41.0 | M = 205.9 SD = 20.8 | M = 171.1 SD = 10.5 |
Children | M = 390.8 SD = 24.3 | M = 271.5 SD = 29.3 | M = 353.4 SD = 45.7 | M = 349.6 SD = 54.9 | M = 341.4 SD = 51.5 | M = 273.0 SD = 32.0 | M = 276.7 SD = 26.9 |
As shown in Figure 4, similar to their Mandarin T1, both adults' and children's Fuzhou T1 has a relatively stable and high F0, and adults' Fuzhou T1 is slightly higher than children's in the overall pitch height. As for Fuzhou T2, adults' and children's curves share a similar falling contour, although children's T2 exhibited a relatively higher F0 than adults' T2 before the seventh (60%) normalized time point and a lower F0 after that point. Despite the pitch height differences in T1 and T2 between adults and children, however, there are no other high-level or high-falling non-checked tones other than T1 and T2 in the Fuzhou dialect. Therefore, these minor differences did not affect the results of perceptual judgment by the judges, thus resulting in the non-significant difference in perceived accuracy rates of Fuzhou T1 and T2 between adults and children.
For Fuzhou T3, although the tokens used for the F0 analysis were all treated as correct by the judges, there are noticeable differences in pitch contour between adults' and children's productions. Adults' Fuzhou T3 is apparently a falling tone with a gradual slope, starting from a z-score point around 0.3 and descending towards the ending point of −1. By contrast, children's correctly produced T3 has a much more stable F0 with a very slight slope, and the overall normalized pitch curve is mostly located within the LZ range of 0∼−0.3. According to the pitch curves, adults' T3 could be transcribed as a mid-falling tone, e.g., [31], whereas children's T3 should be considered as a mid-level tone, e.g., [33]. As both [31] and [33] have been adopted to transcribe Fuzhou T3 in previous studies and the judges in this study perceived both as correct Fuzhou T3, such a discrepancy between adults and children could be treated as a generational difference. However, it is noteworthy that, for most of children's T3 tokens that were perceived as incorrect, the judges commented on the unnaturally high pitch, which made those tokens sound like a high-level tone rather than a mid-level tone and were thus perceived as Fuzhou T1 in the perceptual judgment. This “T3 → T1” error pattern has been shown in Table 3, and we will return to this in Section 4.
Figure 4 reveals no significant difference between the correct Fuzhou T4 tokens produced by children and adults with respect to the overall pitch height, and we can see that both adults' and children's correctly produced T4 exhibited an overall low pitch contour. Note that adults' and children's correct T4 were not different from each other in terms of duration either (Table 4). Hence, the normalized pitch curves of the correct tokens and the duration measurement do not provide any explanation for why children's production of Fuzhou T4 in this study had the lowest perceived accuracy and the most varied error patterns among the seven Fuzhou lexical tones, which could thus only be ascribed to children's relatively later acquisition of this tone. This will be further discussed in Section 4.
The normalized pitch curves of Fuzhou T5 show a major difference between adults' and children's productions, although the tokens used for the F0 analysis were all perceived to be correct. Despite the shared rising-falling contour, the F0 peak of adults' Fuzhou T5 appears right at the sixth (50%) normalized time point, which divides the curve into two equal parts, whereas the F0 peak of children's T5 occurs later, between the seventh (60%) and the eighth (70%) time points. Also, the ending point of children's T5 (above −0.5 in LZ) is much higher than that of adults' (around −0.8 in LZ). Therefore, even for children's T5 tokens that were labeled as correct, it was hard for the judges to transcribe the tone as a standard rising-falling Fuzhou T5 [242]. Moreover, according to the judges' comments in the perceptual judgment, in many T5 tokens produced by children, the judges could hardly hear the falling part, which contributed to the high percentage of the “T5 → [35]” error pattern (75.73%) in children's incorrect T5 productions, as shown in Tables 2 and 3.
For the correctly produced Fuzhou T6 tokens, it is quite obvious that children's T6 is much higher than adults'. Comparing Figures 2 and 4, we can find that, despite the shared rising contour, adults' Fuzhou T6 (mostly within LZ range −1∼0) is significantly lower than their Mandarin T2 (mostly within LZ range −0.5∼1), while children's correctly produced Fuzhou T6 (mostly within LZ range −0.5∼0.5) and Mandarin T2 (mostly within LZ range −0.5∼1) are closer to each other in terms of the overall pitch height. Moreover, two-tailed independent samples t-tests showed that, even children's T6 tokens that were perceived as correct were statistically longer in the mean duration as compared to their adult counterparts (adults' T6 vs. children's T6: t(48) = −6.285, P < 0.001, d = −2.222), while such a difference was not found in adults' and children's correct productions of the five non-checked tones in Fuzhou (adults vs. children: T1: t(48) = −1.327, P = 0.191, d = −0.469; T2: t(48) = −0.414, P = 0.681, d = −0.146; T3: t(48) = 1.086, P = 0.283, d = 0.384; T4: t(48) = −0.527, P = 0.600, d = −0.186; T5: t(48) = 1.757, P = 0.085, d = 0.621). On a related note, when transcribing children's incorrect T6 tokens, the judges noted that most of those tokens had unnaturally high pitch and even longer duration, as compared to adults' productions. The resemblance between their Fuzhou T6 and Mandarin T2 and the noticeable longer duration thus made children's Fuzhou T6 sound like Mandarin T2, which explains why nearly 80% of children's incorrect Fuzhou T6 tokens were labeled as [35] in the perceptual judgment.
Children's accurate Fuzhou T7 tokens share a high and stable F0 with adults', as illustrated in Figure 4. The most important discrepancy between their productions lies in the mean duration. Table 4 shows that the mean duration of adults' T7 was approximately half or even shorter than that of the non-checked T1, T3, T4, and T5, while even children's correct T7 exceeded two-thirds of these four tones in duration and was significantly longer than their adult counterpart (adults' T7 vs. children's T7: t(48) = −12.095, P < 0.001, d = −4.276). It was reported by the judges that the children's T7 tokens labeled as accurate were actually not perceived by them as entirely natural in terms of duration, but the incorrect tokens were even much longer, making it impossible for them to be perceived as correct. Due to the excessive duration and a high stable F0, it is unsurprising that “T7 → T1” was the most frequent error pattern (83.85%) found in children's erroneous Fuzhou T7 productions.
4 Discussion
As the first attempt to investigate the development of lexical tones in bilingual children acquiring Mandarin and the Fuzhou dialect, this study examined 6- to 7-year-old children's Mandarin and Fuzhou tone production and compared their performance with the performance of adults in terms of both production accuracy and acoustic features. Our findings revealed that, Mandarin-Fuzhou bilingual children in this study, despite being considered by their parents to have higher proficiency in the Fuzhou dialect than their peers and demonstrating adult-like performance in Mandarin tone production, require a longer period for complete mastery of the Fuzhou tonal system due to the influence from Mandarin.
Based on the results presented in Section 3, we can observe that although the Mandarin tone production of Mandarin-Fuzhou bilingual children aged 6;11 to 7;6 was not as perfect or “standard” as Mandarin-speaking adults, their productions of all the four lexical tones have already reached adult-like ceiling accuracy according to the perceptual judgment of native judges. Moreover, the acoustic characteristics of their Mandarin tone productions, especially the overall F0 contour and mean duration, were not significantly different from those of adults' productions.
In stark contrast to their ceiling performance in Mandarin tone production, these bilingual children, who had unbalanced exposure to Mandarin and the Fuzhou dialect, were prone to erroneous tone productions in the Fuzhou dialect. Among the seven Fuzhou lexical tones, they produced only two (i.e., T1 and T2) with adult-like accuracy, while their productions of the other five were perceived with significantly lower accuracy as compared to adults' productions. These children's production accuracy of the seven tones in descending order was T1 > T2 > T5 > T7 > T3 > T6 > T4, and their T1 and T2 were perceived significantly better than the other tones, which indicates not only the protracted nature of their tone acquisition process in the Fuzhou dialect, but also an imbalance in their development of different Fuzhou tones.
Further examinations of the error patterns and the acoustic characteristics of children's Fuzhou tone productions showed that there were systematic errors in children's productions and their development of Fuzhou tones was indeed affected by their dominant language Mandarin. Our findings suggest that, among the seven Fuzhou lexical tones, T1 and T2 are the easiest for 6- to 7-year-old bilingual children to produce. This is not surprising because we can find their counterparts in Mandarin—both Fuzhou T1 and Mandarin T1 are high-level, and both Fuzhou T2 and Mandarin T4 are high-falling. Although Mandarin T1 [55] is higher than Fuzhou T1 [44] in terms of the overall pitch height and Mandarin T4 [51] has a steeper negative slope as compared to Fuzhou T2 [53], it seems that children in our study did not distinguish tone pairs Mandarin T1-Fuzhou T1 and Mandarin T4-Fuzhou T2. Comparing Figures 2 and 4, we can find that the overall pitch heights and contours of children' Fuzhou T1 and T2 were very close to those of their Mandarin T1 and T4. Clearly, children's acquisition of Fuzhou T1 and T2 was greatly aided by their correspondence with Mandarin tonal categories.
Regarding the other five tones in the Fuzhou dialect, similarities were observed in children's T3, T5, T6, and T7. Since Mandarin does not have a mid-level/mid-falling tone or a rising-falling tone, nor does it have checked tones, none of Fuzhou T3, T5, T6, and T7 has a direct counterpart in Mandarin. The characteristics of these tones, especially the extra complex contour of T5 [242] and the particularly short duration of T6 and T7, thus make them difficult for children to master. Therefore, it could be argued that, it is the lack of correspondence between Mandarin and Fuzhou that led to the significantly lower accuracy rates of these four tones in children's production as compared to their Fuzhou T1 and T2.
Moreover, the analysis of children's error patterns of Fuzhou T3, T5, T6, and T7 revealed that the similarity in the overall pitch contour between two tones played an essential role in children's use of the substitute forms. Specifically, children mostly confused their Fuzhou T3 [33] and T7 [5] with Fuzhou T1 [44] (or Mandarin T1 [55]), and confused Fuzhou T5 [242] and T6 [23] with Mandarin T2 [35]. This finding echoes well the findings of error patterns in Wong et al.'s (2017) study on Cantonese-speaking children's tone production, and suggests that 6- to 7-year-old bilingual children in this study have not acquired the ability to distinguish tones with similar contours. Such a lack of ability to distinguish similar contours may also reflect children's protracted development in their tone perception ability, which should be investigated in future studies to provide a more complete picture of Mandarin-Fuzhou bilingual children's tone acquisition.
As a side, it was found in our study that children tended to either produce Fuzhou T3 as a mid-level tone [33] with a very slight slope or confuse it with a high-level tone, whereas the adult participants consistently produced it as a mid-falling tone [31] with a steeper slope. Here we offer one possible account for children's [33]. The average age of the Fuzhou-speaking adults in our study was 59, while the parents of 6- to 7-year-old children are generally much younger (usually under 40 years old). A preliminary test on the Fuzhou T3 produced by these children's parents showed that they were more inclined to produce a mid-level tone rather than a mid-falling tone. This indicates that the difference between [31] and [33] may indeed reflect a generational variation, as suggested in Section 3.2.3, and that children's mid-level Fuzhou T3 is very likely to come from their parents—a further study on the Fuzhou T3 production of these bilingual children's parents will be useful to shed light on this. The input of parents' mid-level Fuzhou T3, together with children's underdeveloped ability to distinguish mid-level and high-level, might have contributed to the high percentage of confusion between children's Fuzhou T3 and the high-level tone.
In terms of the production of Fuzhou T4, the results in Figure 3 and Table 3 showed that children made more errors and had more diverse error patterns in their production, as compared to all the other tones, suggesting that Fuzhou T4 may be the most difficult tone for Mandarin-Fuzhou bilingual children at 6;11–7;6. Since Fuzhou T4, usually labelled as [21/213], has a direct counterpart in Mandarin tonal system, namely Mandarin T3 [214/21], it is not easy to speculate on why Fuzhou T4 was produced by children with the lowest average accuracy rate and the most divergent error patterns. One possible explanation is that Fuzhou T4 never maintains its citation tonal value when it undergoes tone sandhi in connected speech—it is changed to either [44] or [51] whenever followed by another tone in a tone sandhi context (R. Li et al. 1994; You 2020; among others). This coarticulatory effect might affect how children perceive this tone when they are exposed to it in adult speech, which could be one factor contributing to children's difficulty in its production. Another factor at play could be the articulatory complexity associated with tones like Fuzhou T4 and Mandarin T3. Although the order of acquisition of Mandarin tones varies across previous studies, many have reported that T3 is the last or the second last to be acquired (C. N. Li & Thompson 1977; Wong, Schwartz & Jenkins 2005; Wong 2012, 2013; R. Xu et al. 2018; among others). Wong (2012, 2013) provided an explanation for children's Mandarin tone development from a physiological perspective and argued that T3 has the highest degree of articulatory complexity among the four Mandarin tones. Given the similarity between Fuzhou T4 and Mandarin T3, a similar physiological account may be relevant in understanding children's performance in their Fuzhou T4 productions in this study. The absence of Fuzhou T4's citation form in natural speech, its relatively higher degree of articulatory complexity, and other undisclosed factors, may collectively contribute to the challenges children face in acquiring this tone. Further research is essential for a deeper understanding of this issue.
On a related note, our data also showed that, in the seven major error patterns of children's Fuzhou tone productions (Table 3), there were only three substitute forms, namely Fuzhou T1, Fuzhou T2, and the mid-rising [35]. Among them, both Fuzhou T1 and T2 have their counterparts in Mandarin (i.e., Mandarin T1 and T4), and the mid-rising [35] is exactly Mandarin T2. This indicates that children tended to use tones they could produce with high accuracy (i.e., Fuzhou T1 or Mandarin T1, Fuzhou T2 or Mandarin T4, and Mandarin T2) in their substitution patterns. Specifically, for Fuzhou T3, T5, T6, and T7, as discussed above, tones with similar pitch contours were likely to emerge as the substitute forms (i.e., Fuzhou T1 or Mandarin T1 for Fuzhou T3 and T7, Mandarin T2 for Fuzhou T5 and T6). By contrast, for Fuzhou T4, which was the most difficult for bilingual children in this study, children randomly selected any one of the tones they were familiar with as the substitute form when they were not able to correctly produce this low-dipping tone. This finding further demonstrates the cross-linguistic transfer from Mandarin tonal system to children's Fuzhou tone productions, and echoes Abudarham's (1987) hypothesis that bilingual children are likely to use some phonological features of the more dominant language when they speak the less dominant one.
It has been argued that, as compared to monolingual children, the input to which bilingual children are exposed plays a more decisive role in their language development (Montrul 2008; Grüter & Paradis 2014), and the development of the two languages is usually unbalanced because of the typically unevenly distributed dual input (Bernardini & Schlyter 2004). Our findings reveal such an unbalanced development in 6- to 7-year-old Mandarin-Fuzhou bilingual children's production of lexical tones, and demonstrate the imbalanced interference between children's two languages—it is their dominant language (Mandarin) that strongly affects the acquisition of the less dominant language (Fuzhou), not the other way around.7
Our findings seem to further support the results in Light (1977), Mok & Lee (2018), and Yao et al. (2020), where cross-linguistic influence was reported, but stand in stark contrast to those in previous studies on bilingual children acquiring two tonal languages. According to So & Leung (2006), Law & So (2006), and X. Li (2020), Mandarin-Cantonese and Mandarin-Southern Min bilingual children made very few errors in their productions of lexical tones in the less dominant language, and there was no cross-linguistic transfer from the dominant language Mandarin. However, our data showed that this is not the case for Mandarin-Fuzhou bilingual children aged 6;11–7;6. As the child participants in our study and those in So & Leung (2006), Law & So (2006), and X. Li (2020) are all simultaneous bilinguals, the discrepancies in findings can only be accounted for by the differences in the quantity and quality of tonal input to bilingual children in different studies. Even though the bilingual children in this study have been receiving dual input from their parents and grandparents at home, the relatively uneven input to which they have been exposed has resulted in the more protracted and more unbalanced development in their Fuzhou tone production. This implies that extended time and more substantial Fuzhou input (at home, in community, and even at school) are necessary for them to eventually achieve adult-like production performance.
Our assumption that the Mandarin-Fuzhou bilingual children would eventually acquire a version of Fuzhou exhibiting similarities to the speech of the adults around them is partially grounded in the well-documented phenomenon of language acquisition, where children often learn and adapt their linguistic systems based on the input they receive from adults in their linguistic environment. Additionally, we conducted a preliminary analysis of the productions of a few Mandarin-Fuzhou bilingual children aged around 12, right after the reported production experiment. These children shared similar language backgrounds with the 6-7-year-olds involved in our current study, but their T3, T4, T5, T6, and T7 have approached adult-like performance with perceived accuracy rates exceeding 85%. We also tracked the phonological development of three children (1 female from 7;1 to 7;11, 2 males from 7;3 to 8;2) who participated in our current study by recording them at weekly intervals in their natural conversations with parents and/or grandparents at home. This longitudinal study has been ongoing for nearly one year, with each recording lasting approximately 30 min. From the analysis of recordings collected in the last three months, we observed that their performance in Fuzhou T4, T5, T6, and T7 was closer to adults, to different extents, as compared to their performance during the reported production experiment. Based on these observations, we speculate that the lexical tone production of these bilingual children may ultimately become similar to that of adults as they grow into adulthood. As both the cross-sectional study on older bilingual children and the longitudinal study have not yet been completed, the exact timeline and the completion of children's acquisition process require further research.8
5 Conclusion
The current study provides the first empirical evidence for the unequal development between Mandarin and Fuzhou lexical tones by 6- to 7-year-old Mandarin-Fuzhou simultaneous bilingual children. In sum, the results show that, although these children produced Mandarin tones with adult-like accuracy, they do not master the production of Fuzhou tones by the age of 7;6. Among the seven lexical tones in the Fuzhou dialect, only T1 and T2 could be considered close to adult-like, while all the other tones were produced by children with significantly lower accuracy. Our findings suggest that the differences between Mandarin and Fuzhou tonal systems and the noticeable exposure imbalance between these two languages are two of the most essential factors contributing to these bilingual children's much more protracted and unbalanced development of Fuzhou tones. Our study also demonstrates the effect of cross-linguistic transfer in the tone production of bilingual children acquiring two tonal languages simultaneously from birth, showing that the phonological features of tones in the more dominant language can strongly affect children's acquisition of tones in the less dominant language. This study not only extends the literature of children's tone acquisition by investigating the production of tones by an understudied bilingual population, but also opens a new line of inquiry. Given that 6- to 7-year-old Mandarin-Fuzhou bilingual children in this study, who are considered to excel beyond their peers in terms of their Fuzhou proficiency, are still developing their abilities in Fuzhou lexical tone production, we may ask the following questions: Is there an age effect in the development of these children's Fuzhou tones, and if yes, when will they achieve adult-like production accuracy in all the seven lexical tones? Is the development of their tone production abilities associated with that of their perception, and if yes, how? Will the late and unbalanced development in lexical tone production lead to an even more protracted process when these children acquire the complex tone sandhi patterns in the Fuzhou dialect? All these relevant questions should be addressed in follow-up studies to further our understanding of tone acquisition by bilingual children acquiring two tonal languages.
Competing interest declaration
Competing interests: The authors declare none.
Acknowledgments
This research was partially supported by CUHK Direct Grant (Project No. 4051187) “Tonal acquisition by children speaking the Fuzhou dialect: A preliminary study on the acquisition of non-entering tones and disyllabic non-entering tone sandhi”. We are profoundly grateful to Professor András Cser and the two anonymous reviewers of Acta Linguistica Academica for their insightful comments and suggestions.
References
Abudarham, Samuel. 1987. Fact and friction. In S. Abudarham (ed.) Bilingualism and the bilingual: An interdisciplinary approach to pedagogical and remedial issues. Birshire: NFER-Nelson. 15–34.
Bernardini, Petra and Suzanne Schlyter. 2004. Growing syntactic structure and code-mixing in the weaker language: The ivy hypothesis. Bilingualism: Language and Cognition 7(1). 49–69.
Boersma, Paul and David Weenink. 2020. Praat: Doing phonetics by computer. Computer program, Version 6.1.16. Retrieved from http://www.praat.org/.
Cao, Jianfen. 2012. Pitch prominence and tonal typology for low register tone in Mandarin. Proceedings of the 3rd International Symposium on Tonal Aspects of Languages (TAL2012). Retrieved from https://www.isca-speech.org/archive/tal_2012/cao12_tal.html.
Chan, Marjorie. 1980. Syntax and phonology interface: The case of tone sandhi in the Fuzhou dialect of Chinese. Manuscript. University of Washington, Seattle, WA.
Chao, Yuen-Ren. 1930. A system of tone-letters. Le Maître Phonétique 45. 24–27.
Chao, Yuen-Ren. 1951. The Cantian idiolect: An analysis of the Chinese spoken by a twenty-eight-months-old child. In W. J. Fischel (ed.) Semitic and oriental studies. Berkeley, CA: University of California Press. 27–44.
Chen, Matthew. 2000. Tone sandhi: Patterns across Chinese dialects. Cambridge: Cambridge University Press.
Chen, Leo and Jerry Norman. 1965. An introduction to the Foochow dialect. San Francisco, CA: San Francisco State College.
Chen, Ping. 1999. Modern Chinese: History and sociolinguistics. Cambridge: Cambridge University Press.
Chen, Zeping. 1998. Fuzhou fangyan yanjiu [A study of the Fuzhou dialect]. Fuzhou: Fujian People’s Publishing House.
Clumeck, Harold. 1980. The acquisition of tone. In G. H. Yeni-Komshian, J. F. Kavanaugh and C. A. Ferguson (eds.) Child phonology, Vol. I: Production. New York, NY: Academic Press. 257–275.
De Houwer, Annick. 1995. Bilingual language acquisition. In P. Fletcher and B. MacWhinney (eds.) The handbook of child language. Cambridge, MA: Blackwell. 219–250.
Donohue, Cathryn. 2013. Fuzhou tonal acoustics and tonology. Munich: Lincom Europa.
Duanmu, San. 2000. The phonology of Standard Chinese. Oxford: Oxford University Press.
Fleiss, Joseph L. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5). 378–382.
Fleiss, Joseph L., Bruce Levin and Myunghee Cho Paik. 2003. Statistical methods for rates and proportions, 3rd edn. Hoboken: Wiley.
Grech, Helen and Sharynne McLeod. 2012. Multilingual speech and language development and disorders. In D. Battle (ed.) Communication disorders in multicultural populations, 4th edn. St. Louis, MO: Elsevier. 120–147.
Grüter, Theres and Johanne Paradis (eds.). 2014. Input and experience in bilingual development. Amsterdam & Philadelphia, PA: John Benjamins.
Holm, Alison and Barbara Dodd. 1999. A longitudinal study of the phonological development of two Cantonese–English bilingual children. Applied Psycholinguistics 20(3). 349–376.
Holm, Alison and Barbara Dodd. 2006. Phonological development and disorder of bilingual children acquiring Cantonese and English. In H. Zhu and B. Dodd (eds.) Phonological development and disorders in children: A multilingual perspective. Clevedon: Multilingual Matters. 286–325.
Hsu, Hui-Chuan. 1989. Phonological acquisition of Taiwanese: A longitudinal case study. M.A. thesis. National Tsing Hua University, Hsinchu City.
Hu, Mingxiao and Zhenxing Zhang. 2020. Yue Gang Ao Dawanqu yuyan yanjiu zongshu [A review of research on linguistic varieties used in Guangdong-Hong Kong-Macao Greater Bay Area]. Chinese Journal of Language Policy and Planning 1. 34–45.
International Expert Panel on Multilingual Children's Speech. 2012. Multilingual children with speech sound disorders: Position paper. Retrieved from https://www.csu.edu.au/research/multilingual-speech/iepmcs.
Landis, Richard and Gary Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33. 159–174.
Law, Naska and Lydia So. 2006. The relationship of phonological development and language dominance in bilingual Cantonese–Putonghua children. International Journal of Bilingualism 10(4). 405–427.
Li, Charles N. and Sandra A. Thompson. 1977. The acquisition of tone in Mandarin-speaking children. Journal of Child Language 4(2). 185–199.
Li, David C. S. 2006. Chinese as a lingua franca in greater China. Annual Review of Applied Linguistics 26. 149–176.
Li, Rulong, Yuzhang Liang, Guangchun Zou and Zeping Chen. 1994. Fuzhou fangyan cidian [The dictionary of the Fuzhou dialect]. Fuzhou: Fujian People’s Publishing House.
Li, Shihuan. 2013. Tuipu huanjing xia fangyan de shengcun xianzhuang yu fazhan: Yi Fuzhouhua weili [Living situation and the development of the dialects in the circumstance of promoting Putonghua: Taking Fuzhou dialect as an example]. M.A. thesis. Tianjin University, Tianjin.
Li, Xiaolin. 2020. The acquisition of Xiamen citation tones and tone sandhi by children. M.Phil. thesis. The Chinese University of Hong Kong, Hong Kong.
Light, Timothy. 1977. CLAIRETALK: A Cantonese-speaking child's confrontation with bilingualism. Journal of Chinese Linguistics 5(2). 261–275.
Lin, Yen-Hwei. 2007. The sounds of Chinese. Cambridge: Cambridge University Press.
Luang-Thongkum, Theraphan. 1997. Tone change and language contact: A case study of Mien-Yao and Thai. In A. S. Abramson (ed.) Southeast Asian linguistic studies in honour of Vichin Panupong. Bangkok: Chulalongkorn University Press. 153–160.
Mok, Peggy, Holly Fung and Vivian Li. 2019. Assessing the link between perception and production in Cantonese tone acquisition. Journal of Speech, Language, and Hearing Research 62(5). 1243–1257.
Mok, Peggy and Albert Lee. 2018. The acquisition of lexical tones by Cantonese–English bilingual children. Journal of Child Language 45(6). 1357–1376.
Mok, Peggy, Vivian Li and Holly Fung. 2020. Development of phonetic contrasts in Cantonese tone acquisition. Journal of Speech, Language, and Hearing Research 63(1). 95–108.
Montrul, Silvina. 2008. Incomplete acquisition in bilingualism: Re-examining the age factor. Amsterdam & Philadelphia, PA: John Benjamins.
Romaine, Suzanne. 2001. Bilingual language development. In M. Barrett (ed.) The development of language. Hove: Psychology Press. 252–275.
Rose, Phil. 1987. Considerations in the normalization of the fundamental frequency of linguistic tone. Speech Communication 6(4). 343–351.
Royal College of Speech and Language Therapists. 2006. Communicating quality, 3rd edn. London: Royal College of Speech and Language Therapists.
So, Lydia and Barbara Dodd. 1995. The acquisition of phonology by Cantonese-speaking children. Journal of Child Language 22(3). 473–495.
So, Lydia and Cheung-Shing Leung. 2006. Phonological development of Cantonese-Putonghua bilingual children. In H. Zhu and B. Dodd (eds.) Phonological development and disorders in children: A multilingual perspective. Clevedon: Multilingual Matters. 413–428.
Tang, Zhixiang and Deliang Liu. 2008. Shenzhen Tai Gang qingnian xuesheng de yuyan quxiang he yuyan rentong [Language orientation and language identity of Taiwan and Hong Kong youth students in Shenzhen]. Research on Chinese as a Second Language 00. 199–213.
To, Carol, Pamela Cheung and Sharynne McLeod. 2013. A population study of children’s acquisition of Hong Kong Cantonese consonants, vowels, and tones. Journal of Speech, Language, and Hearing Research 56(1). 103–122.
Thurgood, Graham. 1999. From ancient Cham to modern dialects: Two thousand years of language contact and change. (Oceanic Linguistics Special Publication No. 28). Honolulu, HI: University of Hawaiʻi Press.
Tsay, Jane. 2001. Taiwanhua shengdiao xide de yanjiu [Study on the acquisition of Taiwanese tones]. Bashijiu nian Guokehui Yuyanxuemen (yiban yuyanxue) yanjiu chengguo fabiaohui lunwenji [Proceedings of the 2000 Research Symposium of Linguistics Division (General Linguistics), National Science Council]. 237–255.
Tse, John K. P. 1978. Tone acquisition in Cantonese: A longitudinal case study. Journal of Child Language 5(2). 191–204.
Valdés, Guadalupe and Richard Figueroa. 1994. Bilingualism and testing: A special case of bias. Norwood, NJ: Ablex.
Wong, Puisan. 2012. Acoustic characteristics of three-year-olds’ correct and incorrect monosyllabic Mandarin lexical tone productions. Journal of Phonetics 40(1). 141–151.
Wong, Puisan. 2013. Perceptual evidence for protracted development in monosyllabic Mandarin lexical tone production in preschool children in Taiwan. Journal of the Acoustical Society of America 133(1). 434–443.
Wong, Puisan, Wing Fu and Eunice Cheung. 2017. Cantonese-speaking children do not acquire tone perception before tone production: A perceptual and acoustic study of three-year-olds’ monosyllabic tones. Frontiers in Psychology 8. 1450. Retrieved from https://www.frontiersin.org/articles/10.3389/fpsyg.2017.01450/full.
Wong, Puisan and Carrie Leung. 2018. Suprasegmental features are not acquired early: Perception and production of monosyllabic Cantonese lexical tones in 4- to 6-year-old preschool children. Journal of Speech, Language, and Hearing Research 61(5). 1070–1085.
Wong, Puisan, Richard Schwartz and James Jenkins. 2005. Perception and production of lexical tones by 3-year-old, Mandarin-speaking children. Journal of Speech, Language, and Hearing Research 48(5). 1065–1079.
Wong, Puisan and Winifred Strange. 2017. Phonetic complexity affects children’s Mandarin tone production accuracy in disyllabic words: A perceptual study. Plos One 12(8). 1–21.
Xu, Rattanasone, Ping Tang, Ivan Yuen, Liqun Gao and Katherine Demuth. 2018. Five-year-olds’ acoustic realization of Mandarin tone sandhi and lexical tones in context are not yet fully adult-like. Frontiers in Psychology 9. 817. Retrieved from https://www.frontiersin.org/articles/10.3389/fpsyg.2018.00817/full.
Xu, Yi. 2013. ProsodyPro: A tool for large-scale systematic prosody analysis. In B. Bigi and D. Hirst (eds.) Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP 2013). Aix-en-Provence: Laboratoire Parole et Langage. 7–10. Retrieved from http://www.homepages.ucl.ac.uk/∼uclyyix/yispapers/Xu_TRASP2013.pdf.
Yao, Yao, Angel Chan, Roxana Fung, Wing-Li Wu, Natalie Leung, Sarah Lee and Jin Luo. 2020. Cantonese tone production in pre-school Urdu–Cantonese bilingual minority children. International Journal of Bilingualism 24(4). 767–782.
You, Shuxiang. 2018. Clitics and the clitic group in the Fuzhou dialect. International Journal of Chinese Linguistics 5(1). 125–166.
You, Shuxiang. 2020. Prosodic phonology of the Fuzhou dialect: Domains and rule application. London & New York, NY: Routledge.
Zhang, Hang. 2016. The effect of theoretical assumptions on pedagogical methods: A case study of second language Chinese tones. International Journal of Applied Linguistics 27(2). 363–382.
Zhang, Hongming. 1992. Topics in Chinese phrasal tonology. Doctoral dissertation. UCSD, San Diego, CA.
Zhang, Hongming. 2013. Hanyu yuyanxue yu Meiguo bentu hanyu jiaoxue [Chinese linguistics and Chinese language teaching in the United States]. Newsletter of the International Society for Chinese Language Teaching 2013(4). 14–19.
Zhang, Hongming. 2017. Syntax-phonology interface: Argumentation from tone sandhi in Chinese dialects. London & New York, NY: Routledge.
Zhu, Xiaonong. 1999. Shanghai tonetics. Muenchen: Lincom Europa.
Zhu, Hua. 2002. Phonological development in specific contexts: Studies of Chinese-speaking children. Clevedon: Multilingual Matters.
Zhu, Xiaonong. 2005. Shanghai shengdiao shiyanlu [An experimental study of Shanghai tones]. Shanghai: Shanghai Educational Press.
Zhu, Hua and Barbara Dodd. 2000. The phonological acquisition of Putonghua (Modern Standard Chinese). Journal of Child Language 27(1). 3–42.
Throughout this paper, bilingual children, including both simultaneous bilingual children and successive ones, are defined as those who are able to comprehend or produce two languages/dialects regardless of the level of proficiency or use and the age at which the languages/dialects were learned (adapted from Grech & McLeod 2012; also cf. Valdés & Figueroa 1994; Royal College of Speech and Language Therapists 2006; International Expert Panel on Multilingual Children's Speech 2012; among others).
Although Law & So (2006), So & Leung (2006), and X. Li (2020) all examined Mandarin-dominant bilingual children's tone production, the dual input received by their child participants was more evenly distributed. Despite government efforts to promote Mandarin ever since the 1950s, Cantonese is still regarded as “the only dialect that may match Putonghua in terms of geographical and social strength” (P. Chen 1999) and “the strongest Chinese dialect in terms of prestige and the number of mainland Chinese attracted to learn it” (D. C. S. Li 2006), and it is vital in both Hong Kong and Guangdong. In Hong Kong, Cantonese is the predominant language, used by 88.2% of the population aged 5 and over, according to the 2021 population census (see the Hong Kong government website: https://www.gov.hk/en/about/abouthk/facts.htm). As for its status in Shenzhen, while the overall language situation in Shenzhen is more complex than in Hong Kong and other cities in Guangdong due to the large number of immigrants, Shenzhen is a city in which “Mandarin and Cantonese are the primary languages, coexisting with multiple languages/dialects” (Tang & Liu 2008; Hu & Zhang 2020). Therefore, the Cantonese-Mandarin bilingual children in Law & So (2006) and So & Leung (2006) must have relatively extensive exposure to Cantonese in their daily lives. As for the Mandarin-Southern Min bilingual children in X. Li (2020), most of them lived and were raised in the rural areas of Xiamen, which is why they should have ample input of the Southern Min dialect. In contrast, the position of the Fuzhou dialect in the urban areas of Fuzhou is much weaker. According to the survey results of S. Li (2013), over 80% of urban participants rarely or only occasionally used the Fuzhou dialect in everyday life, and less than 20% of families used Fuzhou as their primary household language.
Following Chinese tradition, here tones in the brackets are represented as points along a five-point scale, a notation based on that of Chao (1930), and the digits indicate the pitch height, 5 being the highest and 1 the lowest.
It is difficult to recruit monolingual speakers of the Fuzhou dialect nowadays. Although such individuals do exist, the number is rather limited, and the majority of them are elderly and/or live in the rural areas, which renders them unsuitable as participants of this study.
A reviewer suggested that the school environment may be a bit unnatural for the children to speak the Fuzhou dialect as school is associated with Mandarin. We did consider this aspect before and during the data collection process. When we negotiated the experiment site with the parents, most parents told us that they preferred interviews and recordings in public places rather than at home. Many parents had concerns about privacy and even safety when inviting the researchers (who were completely strangers to them before the experiment) into their homes, which is why we had to conduct the experiment in school or community center. Public places like school and community center are familiar to both parents and children, which we believe could also provide a comfortable or neutral environment for them. To further reduce any potential influence of the testing environment, we gave instructions in the Fuzhou dialect in the elicitation task of Fuzhou words (also discussed in fn. 6 below), ensuring that the children felt at ease while speaking Fuzhou.
The instructions were presented in Chinese characters using Microsoft PowerPoint on the experimenter's laptop, accompanied by simultaneous verbal instructions. Children received instructions in Mandarin for the elicitation task of Mandarin words, while the instructions were given in the Fuzhou dialect when the elicitation task of Fuzhou words was carried out.
A reviewer pointed out that the physiological account proposed by Wong (2012, 2013), as mentioned in our discussion of the bilingual children's production performance of Fuzhou T4, could serve as an alternative explanation. We acknowledge that articulatory complexity may play a role. However, we do not adopt the physiological account to explain all the results in our study. This decision is based on the presence of many controversies regarding the articulatory complexity of tones other than Mandarin T3 in previous studies. For instance, Wong (2012) argued that the descending order of accuracy of children's Mandarin four tones was T4 > T1 > T2 > T3, which followed the order of articulatory complexity, while R. Xu et al. (2018) suggested that the order should be T2 > T1 > T4 & T3. Discrepancies in terms of the articulatory complexity of different tones were also found in previous studies on Cantonese. For example, Wong, Fu & Cheung (2017) reported that 3-year-old Cantonese-speaking children produced Cantonese T5 (low rising tone [23]) with the highest accuracy, while Tse (1978) and So & Dodd (1995) showed that Cantonese T5 was one of the most difficult tones for children to master. Therefore, it is challenging to determine the degree of articulatory complexity for each tone based on previous studies and the data collected in the current study. Future research into children's physiological development is necessary to shed light on this issue.
As the reviewers insightfully pointed out, children are agents of change, and the 6- to 7-year-old Mandarin-Fuzhou bilingual children in the current study might indeed produce the Fuzhou tones differently, rather than incorrectly. The reviewers suggested that the generation of these bilingual children may adjust the Fuzhou dialect in subtle ways, which is a phenomenon commonly observed in language evolution, especially in the context of language contact. The subtle adjustments could result in tone changes such as the tone mergers in Iu Mien (Luang-Thongkum 1997) and the development of a tone system in the atonal Austronesian language Tsat (Thurgood 1999). The results of our current study on 6- to 7-year-old bilingual children show that certain tone mergers may be occurring in the Fuzhou dialect. For example, the significantly longer duration of Fuzhou T6 and T7 in children's speech, in comparison to adults, may result in the gradual loss of the checked/non-checked contrast. Although we have not observed such categorical tone changes in the group of Mandarin-Fuzhou bilingual children aged around 12, based on the data collected so far, we agree with the reviewers about the possibility of long-term linguistic changes in the Fuzhou-speaking community, especially when the 6-7-year-olds in our current study and those of even younger generations grow into adulthood. To explore the developmental trajectory of the younger generations in acquiring the Fuzhou tonal system and the possible changes resulting from subconscious adjustments made by these children during their acquisition process, more extensive data collection and further longitudinal investigation are necessary.