Abstract
The paper investigates the usage of the Hungarian connective vagy ‘or’. Our starting point is Ariel & Mauri's (2018, 2019) and Ariel's (2020) papers about the use of or, where they argue that its core meaning is ‘alternativity’. Our goal is to describe Hungarian vagy ‘or’ by analyzing various corpus data, and compare the results. We examined the personal subcorpus of the Hungarian National Corpus (MNSZ2), and the Hungarian Spontaneous Speech Database (BEA). In this paper, as a tribute to the memory of László Kálmán, we investigated a third corpus that is constructed from Kálmán's very popular informative texts on Qubit.
1 Introduction
In this paper we investigate the usage of the Hungarian connective vagy ‘or’. Our starting point is Ariel and Mauri's work on or in English, especially Ariel (2020), which is based on Ariel & Mauri's (2018) empirical paper with the title “Why use or?”. They argue that the traditional logical interpretations fail to capture the speakers' intended readings. In order to determine the actual usage of or, they have analyzed corpus data. In our research, we study different types of Hungarian texts: written and spoken language corpora, together with a collection of popular scientific articles written by László Kálmán.
The research presented in this paper is part of a larger study on the usage of Hungarian vagy ‘or’. The aims of the research are the following. First, to examine vagy in different types of texts. Second, to compare Ariel's (2020) typology with our findings, whether Ariel's types are represented in Hungarian, and if they do, how similar their proportions are. And third, to clarify the typology for the purpose of formal analysis. In this paper, we present our results concerning three types of text: the personal subcorpus of Hungarian National Corpus (MNSZ2), which is a written corpus containing informal texts; the Hungarian Spontaneous Speech Database BEA (Mihajlik et al. 2022), which is a spoken corpus of Hungarian; and a special corpus constructed from László Kálmán's selected articles, which will be the main focus of this paper, in tribute to him.
In addition to exploring the usage of vagy ‘or’ in Hungarian, our research can contribute to the description of written vs. spoken language, and also informal vs. formal style by examining the characteristics of the connective or in different types of texts. Our results indicate that the actual meaning of or, the relevant, speaker-intended meaning highly depends on the text type: spoken or written, informal or formal. This is the motivation behind the investigation of various genres.
The paper is organized as follows. In Section 2, we briefly discuss the two main approaches to or: the traditional logical view, and the usage-based model. In Section 3, we describe the process of collecting our data. Section 4 is about Ariel's (2020) typology of or. Section 5 is to present our findings in parallel with a detailed analysis of Ariel's (2020) system. We summarize our results in Section 6 by emphasizing the similarities and differences between Ariel's (2020) system and the different types of Hungarian data, with special emphasis on the characteristics of László Kálmán's or constructions in his articles. Finally, in Section 7, we outline the directions for future research.
2 Approaches to or
In this chapter, we briefly discuss first the traditional logical approach to or, and then the usage-based approach that Ariel & Mauri (2018) argue for.1 Both approaches have detailed and adequate descriptions of or; however analyzing actual usage is more suitable for solving communicational challenges.
2.1 Traditional logical approach
According to the standard approach, or has two readings. The inclusive lexical meaning, where at least one, but possibly all disjuncts are true; and an exclusive use, where exactly one disjunct is true. The proper reading in a given sentence typically depends on contextual factors and cultural/encyclopedic knowledge. For instance, (1) is usually interpreted as an inclusive or: ‘I may ask Peter’, or ‘I may ask Paul’, or ‘I may ask both Peter and Paul’; while (2) is normally interpreted as an exclusive or: ‘I can only marry one of them’.
I will ask Peter or Paul. |
I will marry Peter or Paul. |
There is a discrepancy between the logical and the conversational interpretation of or: the logical operation of disjunction (also called alternation) ‘∨’ represents the inclusive reading, even though in natural discourse the exclusive reading is much more common. Grice (1989) argues that only inclusivity follows logically from a disjunctive sentence, since exclusivity is cancellable, as well as the assumption that the speaker does not know which disjunct is true. For instance, (3) and (4) are possible continuations of (2).
… or maybe both, but not at the same time, of course. |
… but I am not going to tell you which one. |
Therefore, according to Grice (1989), the semantic meaning of or is inclusivity, while exclusivity and the speaker's ignorance are conversational implicatures, which derive from the Cooperative Principle, specifically the maxim of quantity.
These two predominant readings have been supplemented by a third one, free choice (Kamp 1973), where the disjunction is interpreted as a conjunction in the scope of a modal operator (5).
You may take a cookie or an apple. |
In (5), both options are allowed, so the sentence is true only if ‘you may take a cookie’, and ‘you may take an apple’ are both true, which is the interpretation of conjunction.
Most of the Hungarian linguistic tradition also bases the interpretation of the connective or on the traditional logical approach, whether they use a syntactic (see Bánréti 1992, 2022), a logical-semantic (Kálmán 2001), or an educational (Bereczkei 2007; Rounds 2001; Fercsik 2001) approach. At the same time, corpus-based studies did not place much emphasis on the connective or (Gábor, Héja & Mészáros 2004; Trebits 2009).
2.2 Usage-based approach
Ariel & Mauri (2018) argue that none of the above discussed traditional approaches capture speakers' intended readings. They have conducted an empirical examination to explore why speakers use or constructions in natural discourse. They examined all or occurrences (1,053 tokens) in the Santa Barbara Corpus of spoken American English and found 20 types of recurrent readings. They argue that “the significance of inclusive and exclusive interpretations, as discourse-relevant readings, has been overrated, in that they have been elevated to speaker-intended messages, when in fact, they are merely assumptions compatible with speakers' messages” (Ariel & Mauri 2018, 941). In other words, though we can calculate the truth conditions of a sentence by applying the traditional logical approach to or; we will not get closer to the speaker's intention: “Why use or?” in that particular discourse. Some readings are rather different from the traditional interpretations, for instance, when the speaker's goal is to raise possible options without committing to either one of them, or to construct a higher-level category by stating just a few examples. Furthermore, in the spirit of the usage-based approach, they also redefine the third traditional reading, free choice, so as to convey speaker-intended messages.
Based on the collected data, Ariel & Mauri (2019) proposes “An ‘alternative’ core for or”: ‘alternativity’. They argue that – since semantic meanings cannot be pragmatically canceled – ‘inclusivity’ cannot be or's core meaning. It is too weak in some cases, when the speaker commits to all alternatives; and it is too strong in some other cases, when the speaker intends only one alternative to be realized. In their view, the interpretation of or is procedural, rather than conceptual, meaning that it instructs the addressee to construe the given options as alternatives. Also, or has a subjective, rather than objective meaning (similarly to ‘contrast’), since being alternatives to each other is not an inherent property.
In sum, the two approaches have different aims and views, yet the latter have more opportunities to describe the meaning of or in communication, which is our aim with this research.
3 Collecting data
To collect Hungarian data, our first step was to examine the personal subcorpus of the Hungarian National Corpus (MNSZ2), which is a written corpus and consists of forum and social media posts. Three independent annotators categorized 223 vagy ‘or’ occurrences from a random sample (Viszket, Kárpáti & Kleiber 2021). As the next step, we started to explore other areas of language use and compare the proportions of different types of or in the different text types. First, we examined a smaller randomly selected sample of the BEA spoken language database with 88 vagy ‘or’ occurrences, which were categorized by two independent annotators (Kárpáti et al. 2022). These two corpora were created by only selecting sentences with vagy ‘or’.
In this paper, our focus is on a special collection of texts consisting of popular scientific articles written by László Kálmán (“Kálmán corpus”), and we compare his usage of vagy ‘or’ to the previously collected data: Ariel's findings about English, on the one hand, and Hungarian corpus data obtained from MNSZ2 and BEA, on the other hand. The following rules were laid down (cf. Hunston 2008) when building the Kálmán corpus: First, we tried to match the size of the corpus (number of tokens) to the previously created corpora we examined. Second, all the sentences of the selected texts were added to the corpus (not just the ones containing or), in order to be able to compare the or/token ratio with the results found in Ariel's study. Third, the texts were divided into utterance units – typically sentences –, with only one or in such a unit (or none). Within the text, we highlighted the vagy ‘or’ occurrences. We have not performed any further computational linguistic analysis on the corpus yet (tokenization, lemmatization).
We created the above-mentioned Kálmán corpus from his publications on Qubit, a website about economy, technology, and science. Kálmán has 70 popular scientific articles on this site, with striking, provoking, often humorous titles (“humor and promotion”). Basically, all areas of linguistics are covered, with a relatively large number of articles about parts of speech categories, syntactic relations, computational linguistics, and language technology. We randomly selected 16 of the 70 articles and added their full texts into the corpus in accordance with our corpus design. With this method, we obtained a corpus of 30,309 words, on which three independent annotators classified the connective or (141 occurrences) into the categories based on Ariel (2020) and our previous analyses on Hungarian corpora.
Assigning the proper types to the utterances was quite challenging, there were frequent disagreements between the annotators, which were resolved eventually. For all three corpora, the observed agreement (before consensus) (Landis & Koch 1977) fell between 87 and 90%, which is considered a convincing ratio. To resolve the ambiguities, we tried to find formal criteria for distinguishing the types.2 Among the three examined corpora, the Kálmán corpus was the least difficult to analyze (even if the observed agreement rates do not clearly reflect this), since the topics Kálmán discussed belong to our field of expertise, and thus, we are familiar with the set of elements connected with or, and the complementary set of statements. This could be the reason why the categorization of Kálmán's or constructions turned out to be less controversial than the categorization of general written or spoken corpus data.
4 Ariel's (2020) Typology for or
In their papers, Ariel & Mauri (2018, 2019) analyze several types of or constructions found in the Santa Barbara Corpus (SBC), such as Raised options, Higher-level category, Conjunctive and Choice. Certain types are mentioned but not discussed in detail, for instance Repair, Hedging, Equivalence and Indifference. In her recent paper, Ariel (2020) refines their previous typology, and constructs a somewhat different hierarchy based on four parameters. In our examination of Hungarian vagy ‘or’, we applied this particular typology.
The most unexpected result of their corpus analysis is that speakers rarely use or in the traditional logical sense, when there is a choice to be made between distinct alternatives (6).
Was that World War Two, or World War One. (Ariel 2020, 2) |
Instead, the most frequent or reading is the so-called Higher-level category (HLC), which is the “furthest” from this interpretation (this intuitive distance is explicated in her paper using the 4 parameters; see Table 1 below). In the case of an HLC, the speaker introduces a single concept, there are no multiple discourse entities. For instance, in (7), the speaker means ‘non-political ruler, monarch’, which is the higher-level category the or construction refers to.
Who was the king or queen? (Ariel 2020, 2) |
Types of or in Ariel (2020, 10): the hierarchy of or interpretations
Multiple discourse entities | Commitment to only one alternative | Exhaustivity | Distinct contextual implications | |
Dilemma | + | + | + | + |
Unresolved choice | + | + | + | + |
Narrowed | + | + | + | − |
Indifference | + | + | (+) | − |
Free alternative | + | +a | NA | − |
Something like that | + | + | − | − |
Raised options | + | − | − | (−) |
Equivalence | − | − | NA | − |
Higher-level category | − | − | − | − |
Ariel (2020) argues that an implicit HLC is always present, in the sense that at least the alternatives are from this category. The difference between constructions like (6) and constructions like (7) is the speaker's focus: whether the speaker focuses on the members (6), or on the higher-level category (7). In this way, she distinguishes between two groups of or constructions: member-focus and category-focus constructions. She discusses 7 types of the former, and 2 types of the latter, 9 types of or constructions altogether. She constructs a hierarchy with respect to 4 parameters: (i) multiplicity, if there are multiple discourse referent; (ii) exhaustivity, if the alternatives exhaust the possibilities; (iii) commitment to one alternative, if the speaker commits to exactly one alternative; and (iv) distinct contextual implications, if the choice between the alternatives actually matters (Table 1).
Considering the opposing ends of the scale – Dilemma and Unresolved choice on the top with 4 positive values, and Higher-level category on the bottom with 4 negative values – she argues that in order to be able to capture the various interpretations, the core meaning of or cannot be ‘inclusivity’, but rather the minimal ‘alternativity’. We elaborate on the types in the next section, giving examples for each of them.
5 Hungarian or in the light of Ariel's (2020) typology
We present our findings for Hungarian parallel to Ariel's (2020) data, so that the results can be directly compared.
5.1 Member-focus or constructions
Ariel's (2020) hierarchy contains 7 types of member-focus or constructions: Unresolved choice, Dilemma, Narrowed, Indifference, Free alternative, Something like that, and Raised options. In this subsection, we discuss these types, first Ariel's (2020) findings in English, and then the Hungarian data based on our corpus-results.
5.1.1 Unresolved choice and Dilemma
In the beginning of the continuum, with 4 positive values, Unresolved choice and Dilemma take place. Unresolved choice (8) expresses the speaker's need or inability to resolve a choice. Dilemma (9) can be regarded as a specialized sub-construction, where the choice is important and difficult. Ariel (2020) also notes that these types involve interrogatives in some form, whether direct, rhetoric or embedded.
I don't remember if it was Evelyn or Deborah. (Ariel 2020, 4) |
“To be, or not to be: that is the question” (Hamlet, III:1) (Ariel 2020, 4) |
Unresolved choice and Dilemma are the closest to the common, everyday interpretation of or: the alternatives are very distinct – in fact, the majority (23/34) is structured as ‘p or not p’ (Ariel 2020, 4) –, and the choice is actually relevant. Also, the speaker commits to only one alternative, and the list is exhaustive. However, they found that these are not typical or constructions in the sense that other types, which are much more highly represented in the corpus, “do not aim to highlight the distinctness between the alternatives profiled” (Ariel 2020, 4).
In the examined Hungarian corpora (MNSZ2 and BEA), we could identify very few examples for both types, which supports Ariel's (2020) findings. The proportions of Unresolved choice (including the sub-construction Dilemma) were 3.23% in the English corpus (Ariel 2020), 3.59% in MNSZ2, and only 1.14% in BEA. As for the exact categorization, it was quite problematic to distinguish between Unresolved choice and Dilemma, because the choice being “important and difficult” is rather subjective.
For this reason, we tried to find a formally capturable criterion for distinguishing Dilemma from Unresolved choice. We propose that an Unresolved choice case involves a real question, in the sense that the addresser expects an answer from the addressee (cf. Gyuris 2017), like in (10), where the addresser is asking for direction. (Note that we kept the original spelling and typography of the corpus examples, except for a few extremely misspelled ones.) On the other hand, a Dilemma case involves a rhetoric question, where the addresser does not expect the addressee to provide an answer (11).
a városmajor az lent van egészen, meggyes út; […] vagy fel kell menni a kékgolyóban valamennyit? (MNSZ2 doc#993) |
‘Is Városmajor far down, on Meggyes Street; or you need to go up a bit on Kék Golyó Street?’ |
vajon mi környezetszennyezőbb, egy hagyományos levél vagy egy e-mail? (MNSZ2 doc#973) |
‘I wonder which one is more polluting: a traditional mail or an e-mail?’ |
Rhetoric questions in Hungarian are often marked with the ‘hesitative’ discourse particle vajon ‘I wonder’ (see e.g., Gärtner & Gyuris 2012), as it is also the case in (11). The distinction between an actual and a rhetoric question has been formalized in ℜeALIS (Alberti et al. 2019); and we argue that the same formal distinction can be made between Unresolved choice and Dilemma based on the interlocutors' BDIs.
In the Kálmán corpus there was no example for Dilemma, and there were 7 cases of Unresolved choice (4.96%). Let (12) serve as an illustration, which is the title of one of his articles regarding the mix-up of two suffixes – therefore it is clearly an Unresolved choice.
-ban/-ben vagy -ba/-be? Megoldjuk okosba! (title) (Kálmán corpus) |
‘-ban/-ben ‘in’ or -ba/-be ‘to’? We'll handle it, no worries!’ (alternatives for Hungarian suffixes: -ban/-ben: inessivus allomorphs; -ba/-be: illativus allomorphs) |
5.1.2 Narrowed and Indifference
The next two types, Narrowed and Indifference, differ from the previously discussed ones in one parameter: they involve a single set of contextual implications (the fourth parameter is minus). In other words, the choice between the alternatives is not relevant, while there are still referentially distinct alternatives, the speaker commits to only 1 alternative, and the list is (typically) exhaustive.
In a Narrowed case, the speaker cannot provide a fully informative statement, though still trying to be as informative as possible (13). In this way, these utterances are not maximally informative, but relevant enough, which is in harmony with Grice's (1989) view on the function of or. In this sense, the type Narrowed is the closest to the traditional ‘exclusive or’ reading.
Wilma is dating Albinoni or Boccherini. (Ariel & Mauri 2018, 970) |
I mean whether the horse is being used a lot or not, that's twelve bucks. (Ariel & Mauri 2018, 994) |
Indifference (14) has the same parameter values as Narrowed, however there is an extra emphasis on the fact that the choice is completely irrelevant. In fact, the point of an utterance like (14) is that these drastically different options do not carry different contextual implications (‘12 bucks either way’). Surprisingly, Ariel & Mauri (2018) reports very few Narrowed type or constructions, only 5 occurrences (0.47%), and the proportion of Indifference cases is also very low, only 1.23% (Ariel 2020).
When analyzing Hungarian data, we found significantly more examples for these two types: 1.7% in SBC, 14.78% in BEA, 38.56% in MNSZ2, and 42.55% in the Kálmán corpus. It can be observed that the percentages increase as the text style becomes more formal. Furthermore, Narrowed is approximately twice as common as Indifference in the Hungarian corpora BEA and MNSZ2, similarly to English. However, the ratio of Narrowed was extremely high in the Kálmán corpus, 36.88% (52 occurrences), which makes this type of or constructions the most characteristic of Kálmán's Qubit articles. The following examples serve as illustrations for Narrowed: (15) from MNSZ2, and (16) from the Kálmán corpus, which also demonstrates why linguistic texts are easier for us to analyze.
Érvénytelen azonosító vagy jelszó! (MNSZ2 doc#1804) |
‘Invalid ID or password!’ |
például latin aurum (semleges nemű) arany (egyes szám alany- vagy tárgyeset) (Kálmán-corpus) |
‘for instance, Latin aurum (neutral) gold (singular, nominative or accusative case)’ |
As for the classification of Indifference, we experienced that it frequently depended on individual interpretation: what counts as “drastically different” for the annotator. To avoid this problem, we tried to find formal criteria to distinguish Indifference cases. We propose two grammatical tests for Hungarian: (i) the construction can be rephrased as an akár-akár-structure (‘whether … or …’); (ii) the clause in which the or construction appears is typically subordinate (17). Additionally, it can be observed that the structure of an Indifference case sentence is frequently: (p v ¬p) → q, where (p v ¬p) is a tautology (always true), therefore the sentence simply states q, no matter what (18). Both these tests, and also our intuitions, indicate that (19) from the Kálmán corpus is also a case of Indifference.
Manapság, legyen szó köznapi vagy jogszabályi nyelvhasználatról, a “számla” kifejezésnek több értelme van. (MNSZ2 doc#1740) |
‘Nowadays, whether it is common or statutory language, the expression “invoice” is ambiguous.’ |
Hiszitek vagy sem, de nem csak a filmeket és a videojátékokat szokták “remake-elni”! (MNSZ2 doc#1813) |
‘Believe it or not, but it's not just movies and video games that get a “remake”!’ |
ami nem tartalmazza, hogy a feltétel ne teljesülhetne most vagy a jövőben (Kálmán corpus) |
‘which does not include that the condition could not be fulfilled now or in the future’ |
When categorizing corpus examples, we often found it difficult to differentiate between Narrowed and Unresolved choice. They only differ in one parameter: whether the choice is relevant or not. This seems to be a clear criterion in theory; in practice, however, we could often interpret these sentences both ways. A straightforward difference between the two types is modality: an Unresolved choice case always involves a question, while a Narrowed case is typically an assertion. Nevertheless, an interrogative sentence can still have a Narrowed reading (20), instead of an Unresolved choice (21), i.e., a yes-no question instead of an alternative question. It depends on information structure, namely the presence of focus, which is marked by word order in Hungarian, and (mostly) stress in English. (Note that the presence of focus is not the only distinguishing prosodic feature in disjunctive questions in English (see Pruitt & Roelofsen 2013), but this issue is not relevant for the purpose of this paper.) Nevertheless, prosody can play a role in distinguishing between or interpretations (cf. Surányi & Gulás 2022).
Találkoztál Péterrel vagy Pállal? |
‘Have you met Peter or Paul?’ (emphasis on only Peter) |
Péterrel vagy Pállal találkoztál? |
‘Have you met Peter or Paul?’ (emphasis on both Peter and Paul) |
As for the formal pragmasemantic analysis, the difference between these types can be captured by a simple parameter in ℜeALIS, namely the addresser's Desire/Intention to find out which alternative is true: it is positive in an Unresolved choice case, but neutral in a Narrowed case.
5.1.3 Free alternative and Something like that
The next two types in Ariel's (2020) hierarchy, Free alternative, and Something like that, have 2 positive and 2 negative (or not applicable) values. The alternatives are referentially distinct (+), and the speaker commits to only 1 alternative (+); but the list is not exhaustive (−), and the alternatives do not trigger different contextual implications (−).
For instance, in (22), which illustrates a Free alternative or construction in Ariel's (2020) system, there are three referentially distinct alternatives; however, ‘the clown’ and ‘the ninja’ count as one option, the new alternatives (as opposed to ‘the Tick’, the old one), and so the choice between them is not relevant. In this case then the speaker actually commits to the possibility of both alternatives, but the realization of only 1 (Ariel 2020, 10). The X or something like that subcategory is for highly similar alternatives, where the referential distinction is so unimportant that the speaker does not even identify all the alternatives (23). In the Kálmán corpus, there was no example for the category Something like that, which is not surprising, considering the colloquial nature of the expression. Free alternative or constructions were not typical either, we have found only 5 instances (3.55%) (24).
Well would you wanna be a clown or a ninja, instead of the Tick? (Ariel 2020, 5) |
A couple tortillas, or maybe a sandwich, or something along that line. (Ariel 2020, 6) |
Amikor generikus eljárást vagy adattípust használunk (tehát olyat, aminek a deklarációjához ilyen típuslistát csatoltunk) (Kálmán corpus) |
‘When we use a generic procedure or data type (that is, one whose declaration has such a type list attached)’ |
In Hungarian, it was sometimes difficult to differentiate between Free alternative and Indifference. After examining the examples, we found a structural distinction, namely that a Free alternative or construction is typically part of the main statement/clause (25), while an Indifference structure (as mentioned above) is typically subordinate, less prominent (17)–(18).
A Pioneer magnón kell lennie egy “A” vagy “Audio” nevű gombnak. (MNSZ2 doc#962) |
‘On the Pioneer tape recorder there must be a button called “A” or “Audio”.’ |
Esélyes eléggé, hogy esett a telefon vagy ilyesmi, de attól még nem kéne lehurrogni az emberkét. (MNSZ2 doc#1991) |
‘There's a good chance that the phone fell or something, but you shouldn't be booing the guy anyway.’ |
The category Something like that can appear in different forms (similarly to English), for instance: vagy valami; vagy ilyesmi; vagy valami ilyesmi; vagy valami hasonló ‘or something like that’. This type is different from the other (usage-based) categories, in that it refers to the linguistic form. We argue that it is unnecessary to add Something like that as a separate type, since it can be categorized as a Free alternative, a Raised options, or an HLC or construction based on its meaning and function. We observed that the proper category depends on the number of named options: when at least two options are explicitly mentioned before the phrase ‘something like that’, it tends to be Free alternative or Raised options (cf. Ariel's example (23)); and when only one option is named, then there are no actual alternatives, thus no multiple discourse entities, therefore a Higher level category or construction is involved. For instance, in (26), the higher-level category is ‘something damaging happened’. In fact, in our investigation, we have only found HLC cases within Something like that constructions in the corpus (250 or occurrences).
5.1.4 Raised options
In a Raised options or construction, the speaker merely raises possible options, allowing a potential third option besides the mentioned alternatives (27). The utterance can be paraphrased as ‘maybe X, maybe Y’, as illustrated in (28), where a maybe explicitly appears, and like and or can be replaced with it without changing the meaning of the sentence.
(voicemail message) Are you there? Or are you sleeping? (Ariel & Mauri 2018, 965) |
He's like 25 or 26, maybe 27… (Ariel & Mauri 2018, 966) |
Raised options is the only category in Ariel's (2020) system that has 1 positive and 3 negative values – more precisely, the last value (distinct contextual implications) is bracketed. It means that the alternatives are referentially distinct, typically without distinct contextual implications (48/61). The list of the alternatives is not exhaustive, and the speaker does not commit to one of the alternatives being the case. This is apparent from the fact that the addition of maybe’s does not change the meaning. However, if we add maybe’s to an Unresolved choice or a Narrowed or construction, they become Raised options (Ariel 2020, 7), as in (29), which is the modified version of the Narrowed type (13).
Wilma is dating maybe Albinoni, (or) maybe Boccherini. |
We have found many Raised options examples in the Hungarian corpora (27.27% in BEA and 13.45% in MNSZ2), which corresponds with Ariel's observation that or constructions with more negative than positive values are much more common in everyday communication. For instance, (30) answers the question Mit csináljak? ‘What shall I do?’ by listing possible (not exhaustive) options. We have also noticed that, in several cases, Raised options constructions only have rhetoric functions: some or all of the mentioned alternatives are unrealistic, the speaker is just trying to make a point by listing more and more extreme options (31). Kálmán's articles differ from these informal texts in this respect as well: we have found only 4 occurrences (2.84%) of Raised options in the Kálmán corpus (32), demonstrating that a more strictly constructed paper contains much fewer open lists, ‘maybe’s’.
Vigyél magaddal egy laptopot, és ugyanazt mint odahaza. Vagy korizzál a Dunán, vagy írjál nekem, mi van arrafele! (MNSZ2 doc#962) |
‘Take a laptop with you and do as you would do at home. Or skate on the Danube or write to me about what's going on!’ |
(Context: nobody noticed the speaker) Legközelebb vmi jól látható táblát akasztok a nyakamba (vagy előbb megbeszélem itt Veletek a randevút!) (MNSZ2 doc#973) |
‘Next time I'll hang a board around my neck (or I'll arrange a meeting here with you!)’ |
hogy nem valamilyen speciális élmény (félreértés, elvárás nem teljesülése vagy más) váltja ki (Kálmán corpus) |
‘that it is not triggered by some special experience (misunderstanding, failure to meet expectations or something else)’ |
For identifying Raised options cases, Ariel's (2020) maybe-test appears to be applicable for Hungarian as well: adding the adverb talán ‘maybe’ does not change the meaning of the sentence. However, distinguishing Raised options from HLC can still be difficult, therefore we tried to find other (formal) criteria, and noticed that in Raised options, the alternatives typically appear in prominent structures, as part of a question, the main clause, or the main predicate.
5.2 Category-focus or constructions
The remaining two categories, Equivalence and Higher-level category (HLC) are category-focus or constructions, as opposed to the previously discussed member-focus ones. The speaker's focus is on a (higher-level) category, instead of the individual elements. This is the end of the continuum with 4 negative parameter values: there are no multiple discourse entities, no distinct contextual implication, no commitment to 1 alternative, and the list is not exhaustive (or the aspect is not applicable). In the Equivalence case, the members of the or construction are “distinct alternatives to each other only as far as linguistic expressions go” (Ariel 2020, 8), they are not distinct as discourse referents (33).
So you are gonna want, or desire a pretty decent copy of your original tape. (Ariel 2020, 8) |
Who was the king or queen? (Ariel 2020, 8) |
ROY: saving the whale, or saving the polar bear, |
PETE: [Right… Pandas] |
ROY: or making sure there's enough grizzly bears, that's fine. |
(Ariel & Mauri 2018, 973) |
Ariel's research reveals that many or constructions emphasize the commonality, rather than the distinctness of the alternatives (the types with no distinct contextual implications); and HLC constructions take this feature one step further (Ariel 2020, 11). We repeat (7) as (34) as a perfect example for this type, where there are clearly no multiple discourse entities; the speaker merely expresses a higher-level category, namely ‘nonpolitical ruler, monarch’. Example (35) demonstrates that the mentioned alternatives are indeed just for describing a higher-level category, since the addressee (Pete) agrees with the speaker (Roy), but he can still add a different alternative to the list, indicating that he interprets Roy's or construction as an HLC referring to ‘saving endangered animals’ (Ariel & Mauri 2018).
Analyzing MNSZ2 and BEA, we have identified a few Equivalence cases (36), and a lot of HLC constructions (37). The proportions we got are very similar to Ariel's results: Equivalence cases take up 3% of SBC (Ariel 2016), 4.55% of BEA, and 1.79% of MNSZ2; while the number of HLC or constructions are much higher: 23.17% in the case of SBC (Ariel 2020), 19.32% in BEA, and 17% in MNSZ2. On the other hand, the proportions of Kálmán's or types are reversed: Equivalence cases are rather common (9.22%), while HLC constructions are very rare (1.42%). The example for Equivalence (38) illustrates that in these contexts using synonyms is not ‘repair’: both expressions are needed for a particular reason. As for HLC, there were only 2 occurrences that at least 2 of the 3 annotators categorized as HLC, one of them is (39), which is far from a perfect example for this type. We can conclude that HLC constructions are practically missing from Kálmán's Qubit articles, which is, again, not surprising, since this genre requires much more precise wording than the texts in the other 3 examined corpora.
Egy jellemvonást vagy sajátosságot a vizsgálódás szempontjai szerint minősíthetünk elhanyagolhatónak vagy lényegesnek. (MNSZ2 doc#1754) |
‘A trait or feature can be classified as negligible or significant according to the aspects of the study.’ |
Néha azt gondoljuk, hogy “Semmi öröm nincs az életemben” vagy “Az élet nehéz!”. (MNSZ2 doc#1738) |
‘Sometimes we think that “There is no joy in my life” or “Life is hard!”.’ |
A nyelvészek azt mondják rá, hogy produktív (vagy, ha magyar eredetű szóval akarják kifejezni, akkor azt, hogy termékeny) (Kálmán corpus) |
‘Linguists say that it is productive (or, if they want to express it with a word of Hungarian origin, that it is termékeny ‘productive’)’ |
az a személy nem „a valóságban”, hanem egy modellben létezik, amit mi alkotunk a valóságról (vagy bármi másról, amiről szó van) (Kálmán corpus) |
‘that person does not exist “in reality” but in a model we create of reality (or whatever it is about)’ |
We found that identifying Equivalence constructions is more straightforward: it boils down to distinguishing denotative and connotative synonyms (Kiefer 2007), in other words, the alternatives express either different entities, or the same entity with different words.
HLC constructions, on the other hand, turned out to be much more problematic to identify, and especially to tell them apart from Raised options cases. In theory, the distinction between Raised options and HLC is clear: there are multiple discourse entities for the former, but not for the latter (while their other parameters match). In a Raised options case, an explicit or an implicit alternative is the correct one; while “Higher-level category asserts that a specific implicit concept, the abstracted category, is the case” (Ariel 2020, 11).
In practice, however, an utterance can often be interpreted both ways, depending on the speaker's focus. When the specific examples are more prominent, it indicates a member-focus construction: Raised options. However, when the examples are only construed for deriving a higher-level concept, it indicates a category-focus construction: an HLC. This “competition” of prominence is the reason why “or constructions are member-focus or category-focus to varying degrees” (Ariel 2020, 14).
The analysis of the Hungarian data supports Ariel's (2020) findings, since we often could not assign a definite category to an example without knowing where the speaker's focus is supposed to be. In order to find the most probable category, we considered grammatical information, as with the previous types. We observed that syntactic structures can provide clues: if the options occur in more prominent structures, it suggests Raised options (see (30) above), whereas the options appearing in less prominent structures suggests an HLC (37).
5.3 Relevant or constructions outside Ariel's (2020) hierarchy
Ariel & Mauri (2018) mention another recurring reading of or: the ‘repair’ function. We have found examples for this type in all three corpora (40), although not to the same extent. The highest ratio is in the spoken language corpus BEA (18%), the second highest in the Kálmán corpus (7%), and the construction is least common in the informal subcorpus of MNSZ2 (1.35%).
Egészen mást (vagy remélem, egészen mást) (Kálmán corpus) |
‘Quite different (or I hope quite different)’ |
Another less studied reading of or is when it is actually not disjunctive, but rather conjunctive, meaning ‘and’ (also mentioned in Ariel & Mauri 2018). We found a few examples for this type in all three corpora, the most in the Kálmán corpus (3.55%) (41).
és ugyanez áll például az orvosi vagy a jogi tudásbázisokhoz szükséges logikai nyelvekre is (Kálmán corpus) |
‘and the same applies to, for example, the logical languages required for medical or legal knowledge bases’ |
Analyzing our spoken language corpus (BEA) and the informal written language (MNSZ2) did not force us to expand Ariel's (2020) typology. However, the annotation of the Kálmán corpus revealed new types for describing the use of or. One of these types is when its function is to connect the elements of a list, simply for the purpose of enumerating illustrations and examples for a statement, and not for choosing a particular alternative (42).
Hiszen a hiába vagy az izibe töve már inkább csak fosszília a magyarban (Kálmán corpus) |
‘After all, the root of “in vain” or izibe is more of a fossil in Hungarian’ |
A rather high amount of Kálmán's or constructions (12%, 17 occurrences) belong to this category, while it did not occur in any of the other three corpora.
6 Discussion
The data on Hungarian vagy ‘or’ supports Ariel's (2020) findings in several aspects. (1) In the examined corpora containing spoken language and informal texts, or is rarely used in a traditional logical sense. (2) It is remarkably frequent that the alternatives do not carry different contextual interpretations. (3) In Hungarian spoken language (BEA), the most frequent reading is also Higher-level category, where the alternatives do not even establish distinct discourse entities. (4) The or/token ratio of the Kálmán corpus is very similar to the ratio found in SBC: 4–5 or/1,000 tokens. (5) Categorization is often ambiguous, especially when deciding between types with very similar parameters. It is an inherent feature of the presented hierarchy that “whereas the top and bottom readings are maximally differentiated, the constructions in the middle are less differentiated” (Ariel 2020, 11). (6) When analyzing spoken corpora, we can often assign two categories to an example: a HLC and a member-focus type, which indicates that an implicit HLC is always present. (7) Finding the proper type often requires the examination of prominence.
Considering the above-mentioned similarities, we can conclude that Ariel's (2020) system can be adapted to Hungarian. However, some clarifications seem to be necessary for the purpose of formal analysis. Our findings based on the examination of the Hungarian data are the following. (1) Some distinctions are hard to formalize, a better alternative can be based on mentalization, i.e., taking the interlocutors' mental states into account: their beliefs, desires, intentions. We found that Unresolved choice and Dilemma can be differentiated if we consider the speaker's expectations: in the case of the former but not the latter, the addressee is expected to provide an answer. (2) Some distinctions can be based on structural (grammatical) properties. For instance, an Indifference or construction is less prominent, typically appearing in a subordinate clause; whereas a Free alternative or construction is more prominent, typically appearing in a main clause. The same is true for Higher-level category (less prominent) and Raised options (more prominent). (3) Some or constructions only have rhetoric functions, in many cases, the alternatives are not real possibilities. Raised options and HLC are often used in this function expressing enhancement or exaggeration. (4) Also, the analysis of Kálmán's articles called attention to less prominent readings of or, such as the one generating and combining illustrations, examples, lists, which is practically missing from the previously examined more informal corpora.
While comparing the results of the different text types, we found that written language corpora (MNSZ2 and Kálmán) differ from spoken language corpora (SBC and BEA) in several points in the use of or, especially in the case of the HLC reading. Most importantly, HLC is not the most frequent type in either of the written language corpora. In fact, the use of HLC exhibits a decreasing trend in this line: spoken language corpus (SBC then BEA) – written language informal corpus (MNSZ2) – Kálmán's popular scientific corpus. These results can be explained considering the functions of higher-level category or in different text types, such as overcoming word finding difficulties in spoken language, or figurative and metaphorical language use in spoken and also in written language. At the same time, this stylistic device is less typical of the academic style, at least in modern linguistics, and thus it is understandable why the occurrence rate of HLC is much lower in Kálmán's texts.
As a conclusion, we provide two graphs to illustrate the data presented in this paper: the distribution of or types in Ariel's (2020) work and in the 3 Hungarian corpora. Figure 1 demonstrates the characteristics of the 4 corpora, how the different types are represented in them. It can be seen that HLC is the most frequent type in SBC, much more common than Raised options, which is the second one. The order is reversed in BEA, where the most common type is Raised options, though HLC is still rather frequent. In written corpora, on the other hand, HLC is much less common, especially in the more formal Kálmán corpus. In these texts, a more traditional use of or, namely Narrowed is the most frequent. We can also see that a special use of or appears in the Kálmán corpus, which is not typical in spoken language or informal written language: the conjunction function connecting list of examples.
The distribution of or types in SBC (Ariel 2020), and in the 3 examined Hungarian corpora (random samples): BEA (spoken corpus), MNSZ2 (written, personal), and the Kálmán corpus
Citation: Acta Linguistica Academica 71, 1-2; 10.1556/2062.2024.00675
The proportions of or types in the Kálmán corpus indicate that despite the humorous, direct language of the Qubit articles, the structure of the sentences does not follow the patterns of the informal register. Based on these results, it can be assumed that a lot of information can be revealed about texts with a machine analysis, as long as it is based on a well-chosen linguistic element and uses carefully prepared learning corpora.
Figure 2 represents the same data from the other side illustrating the distribution of the different corpora within a certain reading of or. It is apparent from the graph that HLC is less and less common when the formality of the text increases, which is an important observation of this paper.
The or types discussed in Ariel (2020) with their occurrences in the different corpora
Citation: Acta Linguistica Academica 71, 1-2; 10.1556/2062.2024.00675
Investigating the informal subcorpus of MNSZ2 and the spoken language BEA yielded similar results to Ariel's (2020). We found that her system can be adapted to Hungarian, the proportion of the established or types are rather similar, even her claims about HLC are relatively valid. The main significance of investigating Kálmán's popular scientific articles is that it revealed that there can be major differences between text types. In the Kálmán corpus, the proportions were considerably different, he typically used vagy ‘or’ in completely different sense, even new categories had to be added to the system. This is the motivation behind further investigating the use of vagy ‘or’ in Hungarian.
7 Future work
The presented investigation is part of a larger study. We plan to expand our research in several directions in the future. First, we plan to analyze other subcorpora (e.g., literary language, legal, political, and scientific texts) and compare the frequencies of the categories measured in different styles. Second, we intend to further examine the role of grammatical properties: sentence structure and modality. Third, we plan to analyze or constructions outside Ariel's (2020) hierarchy more thoroughly, some of which mentioned in her earlier papers, such as Repair (also found in the Hungarian corpora) (43), ‘I can't imagine’ (44), idiomatic expressions (45), or (the also mentioned) conjunctive cases, where the meaning of the structure is ‘both’, and or can be substituted with and (46).
for dinner […] or breakfast (Ariel & Mauri 2018, 993) |
Is that um, … full of yucky stuff? Or what. (Ariel & Mauri 2018, 994) |
One way or the other? (Ariel & Mauri 2018, 994). |
But for mathematics or for science, it's an opportunity for them to get closer to the chaos (Ariel & Mauri 2018, 965) |
Additional or constructions can be found in Hungarian, for instance Approximative or (cf. Halm & Bende-Farkas 2019) meaning ‘approximately n pieces of something’ (47). Though it is clearly idiomatic, we aim to find out whether it can be placed in the system.
mintha egyszerre lefogyott volna vagy hat-nyolc kilót (MNSZ2 doc#1919) |
‘as if she's lost about 6-8 kilos' |
Finally, or appears in several collocations, which restricts the possible interpretations, the possible relations between the alternatives. For instance, vagy éppen / vagy pont ‘or just’ expresses contrast between the alternatives; vagy csak / vagy legalább ‘or just’ / ‘or at least’ indicates a weaker alternative; vagy inkább ‘or rather’ a better alternative; vagy esetleg ‘or perhaps’ a less probable alternative; or vagy-vagy ‘either … or …’, which is the formula for exclusive or in Hungarian. This phenomenon is also worth studying. Our long-term goal is to create a corpus consisting of different genres and styles, in which, in addition to machine-annotated tokens, or constructions are manually annotated, and can be used as a learning corpus for further research purposes in the future. Our more distant research plans also include providing a formal analysis for vagy in a system called ℜeALIS (Alberti et al. 2019) ‘Reciprocal And Lifelong Interpretation System’, which can be characterized as a discourse-representation-based (Kamp, Genabith & Reyle 2011; Asher & Lascarides 2003) formal pragmasemantic theory, representing the interlocutors' mental states (beliefs, desires and intentions, BDIs).
References
Alberti, Gábor, Mónika Dóla, Eszter Kárpáti, Judit Kleiber, Anna Szeteli and Anita Viszket. 2019. Towards a cognitively viable linguistic representation. Argumentum 15 .62–80.
Ariel, Mira. 2016. What’s a distinct or alternative? Journal of Pragmatics 103. 1–14.
Ariel, Mira. 2020. Or constructions, argumentative direction and disappearing ‘alternativity’. Language Sciences 81. 101195.
Ariel, Mira and Caterina Mauri. 2018. Why use or? Linguistics 56(5). 939–994.
Ariel, Mira and Caterina Mauri. 2019. An ‘alternative’ core for or. Journal of Pragmatics 149. 40–59.
Asher, Nicholas and Alex Lascarides. 2003. Logics of conversation. Cambridge: Cambridge University Press.
Bánréti, Zoltán. 1992. A mellérendelés [Coordination]. In F. Kiefer (ed.) Strukturális magyar nyelvtan 1.: Mondattan [Structural Hungarian grammar 1: Syntax]. Budapest: Akadémiai Kiadó. 654–729.
Bánréti, Zoltán (ed.). 2022. Syntax of Hungarian: Coordination and ellipsis. Amsterdam: Amsterdam University Press.
Bereczkei, Klára. 2007. Marking logical connection in presentations. WoPaLP 1. 78–98.
Fercsik, Erzsébet. 2001. A kötőszók szerepe a tankönyvi szövegekben [The function of conjunctions in textbooks]. Könyv és Nevelés [Book and Education] 3(3).
Gábor, Kata, Enikő Héja and Ágnes Mészáros. 2004. Kötőszók korpuszalapú vizsgálata [Corpus based examination of Hungarian conjunctions]. In Z. Alexin and D. Csendes (eds.) A Második Magyar Számítógépes Nyelvészeti Konferencia eladásainak kötete [Proceedings of the Second Hungarian Conference on Computational Linguistics]. Szeged: Department Group of Informatics, University of Szeged. 305–306.
Gärtner, Hans-Martin and Beáta Gyuris. 2012. Pragmatic markers in Hungarian: Some introductory remarks. Acta Linguistica Hungarica 59. 387–426.
Grice, H. Paul. 1989. Studies in the way of words. Cambridge, MA: Harvard University Press.
Gyuris, Beáta. 2017. New perspectives on bias in polar questions: A study of Hungarian -e. International Review of Pragmatics 9(1). 1–55.
Halm, Tamás and Ágnes Bende-Farkas. 2019. The birth of an epistemic indefinite: Vaegy in Transylvanian Hungarian. Handout. SinFonIJA 12, Brno, 12–14 September 2019.
Hunston, Susan. 2008. Collection strategies and design decisions. In A. Lüdeling and M. Kytö (eds.) Corpus linguistics: An international handbook. Berlin: Walter de Gruyter. 154–167.
Kálmán, László (ed.). 2001. Magyar leíró nyelvtan: Mondattan I. (Segédkönyvek a Nyelvészet Tanulmányozásához VI.) [Hungarian descriptive grammar: Syntax I. (Resource Books for the Study of Linguistics VI)]. Budapest: Tinta Könyvkiadó.
Kamp, Hans. 1973. Free choice permission. Proceedings of the Aristotelian Society 74. 57–74.
Kamp, Hans, Josef van Genabith and Uwe Reyle. 2011. Discourse representation theory. In D. Gabbay and F. Guenthner (eds.) Handbook of philosophical logic. Berlin: Springer. 125–394.
Kárpáti, Eszter, Judit Kleiber, Anita Viszket, Judit Hagymási, Laura Kárpáti, Eszter Kocsis, Zsuzsanna Ruppl, Blanka Natasa Fürész and Beier Liu. 2022. Authors’ choice? Differences in the usage of or .Linguistics Beyond and Within – Modular Interfaces and Extra-Systemic Pressures in Linguistic Analysis: Book of Abstracts. 44–45.
Kiefer, Ferenc. 2007. Jelentéselmélet [Meaning theory]. Budapest: Corvina.
Landis, J. Richard and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33(1). 159–174.
Mihajlik, Péter, András Balog, Tekla Etelka Gráczi, Anna Kohári, Balázs Tarján and Katalin Mády. 2022. BEA-base: A benchmark for ASR of spontaneous Hungarian. LREC 2022. 1970–1977.
Pruitt, Kathryn and Floris Roelofsen. 2013. The interpretation of prosody in disjunctive questions. Linguistic Inquiry 44(4). 632–650.
Rounds, Carol. 2001. Hungarian: An essential grammar. London: Routledge.
Surányi, Balázs and Gulás Máté. 2022. A diszjunkció mint Pozitív Polaritású Elem: A prozódia hatása a magyar diszjunktív tagadó mondatok értelmezésére [Disjunction as a Positive Polarity Item: The effect of prosody on the interpretation of Hungarian negated disjunctive sentences]. Jelentés és Nyelvhasználat [Meaning and Language Use] 9(1). 185–212.
Trebits, Anna. 2009. Conjunctive cohesion in English language EU documents – A corpus-based analysis and its implications. English for Specific Purposes 28(3). 199–210.
Viszket, Anita, Eszter Kárpáti and Judit Kleiber. 2021. Why use Hungarian or? Linguistics Beyond and Within – Hierarchies, Boundaries and Continua in Linguistics: Book of Abstracts. 158–159.
Other sources
MNSZ2 corpus: http://clara.nytud.hu/mnsz2-dev/.
Qubit: https://qubit.hu.
László Kálmán’s Qubit articles: https://qubit.hu/author/kalmanl.
László Kálmán mentioned vagy ‘or’ only marginally in his papers: we have found just a few general statements about this topic, in introductory works with co-authors (e.g., Kálmán 2001, 101). Therefore, he will now be the subject of the investigation and not the source. In any case, it was very typical of him that he wrote with co-authors, which is why it was difficult for us to build a corpus of his texts that he wrote alone.
Besides the practical benefits of identifying such criteria, the need for formalization could lead to designing a formalized typology, which is one of our more distant research plans. For this task, we would apply ℜeALIS (Alberti et al. 2019), a formal pragmasemantic system, which represents the interlocutors' mental states (BDIs), thus seems to be suitable for capturing the intended meaning behind or constructions.