Open access

Abstract

Background and aims

Problematic exercise (PE) has mainly been assessed with self-report instruments. However, summarized evidence on the reliability of the scores derived from such instruments has yet to be provided. The present study reports a reliability generalization meta-analysis of six well-known self-report measures of PE (Commitment to Exercise Scale, Compulsive Exercise Test, Exercise Addiction Inventory, Exercise Dependence Questionnaire, Exercise Dependence Scale, and Obligatory Exercise Questionnaire).

Methods

Pooled effect sizes were computed using a random-effect model employing a restricted maximum likelihood estimation method. Univariable and multivariable meta-regressions analyses were employed for testing moderator variables.

Results

Data retrieved from 255 studies (741 independent samples, N = 254,174) identified three main groups of findings: (i) pooled alpha values that, ranging from 0.768 to 0.930 for global scores and from 0.615 to 0.907 for subscale scores, were found to be sensitive to sociodemographic and methodological characteristics; (ii) reliability induction rates of 47.58%; and (iii) the virtually non-existent testing of the assumptions required for the proper applicability of alpha. Data unavailability prevented the provision of summarized reliability estimates in terms of temporal stability.

Discussion

These findings highlight the need to improve reliability reporting of the scores of self-reported instruments of PE in primary studies. This implies providing both prior justification for the appropriateness of the index employed and reliability data for all the subpopulation of interest. The values presented could be used as a reference both for comparisons with those obtained in future primary studies and for correcting measurement-related artefacts in quantitative meta-analytic research concerning PE.

Abstract

Background and aims

Problematic exercise (PE) has mainly been assessed with self-report instruments. However, summarized evidence on the reliability of the scores derived from such instruments has yet to be provided. The present study reports a reliability generalization meta-analysis of six well-known self-report measures of PE (Commitment to Exercise Scale, Compulsive Exercise Test, Exercise Addiction Inventory, Exercise Dependence Questionnaire, Exercise Dependence Scale, and Obligatory Exercise Questionnaire).

Methods

Pooled effect sizes were computed using a random-effect model employing a restricted maximum likelihood estimation method. Univariable and multivariable meta-regressions analyses were employed for testing moderator variables.

Results

Data retrieved from 255 studies (741 independent samples, N = 254,174) identified three main groups of findings: (i) pooled alpha values that, ranging from 0.768 to 0.930 for global scores and from 0.615 to 0.907 for subscale scores, were found to be sensitive to sociodemographic and methodological characteristics; (ii) reliability induction rates of 47.58%; and (iii) the virtually non-existent testing of the assumptions required for the proper applicability of alpha. Data unavailability prevented the provision of summarized reliability estimates in terms of temporal stability.

Discussion

These findings highlight the need to improve reliability reporting of the scores of self-reported instruments of PE in primary studies. This implies providing both prior justification for the appropriateness of the index employed and reliability data for all the subpopulation of interest. The values presented could be used as a reference both for comparisons with those obtained in future primary studies and for correcting measurement-related artefacts in quantitative meta-analytic research concerning PE.

Introduction

Promotion of regular physical activity has been proposed as a comprehensive and valid strategy to reduce cardiovascular risk (Ding et al., 2016). One of the domains in which physical activity is more frequently undertaken is leisure time, in particular, throughout recreational participation in sports activities or by engaging in exercise conditioning/training (Bull et al., 2020). However, a small proportion of the population may develop a potentially dysfunctional pattern of exercise behaviour (Marques et al., 2019). This is a complex and multifaceted phenomenon that, irrespective of the different umbrella terms used to refer to it (e.g., problematic exercise; Scharmer, Gorrell, Schaumberg, & Anderson, 2020; or morbid exercise behaviour; Szabo, Demetrovics, & Griffiths, 2018) implies losing control over exercise behaviour to the point of experiencing harm at a physical level (e.g., injuries or immune problems), psychological level (e.g., altered mood states or inability to concentrate), or social level (e.g., loss of social relationships or job) (Juwono & Szabo, 2021; Szabo et al., 2018).

Existing research on the phenomenon – hereafter referred to as ‘problematic exercise’ (PE) – has been mainly approached using quantitative techniques and, more specifically, self-report instruments (Marques et al., 2019; Szabo, Griffiths, deLa Vega Marcos, Mervó, & Demetrovics, 2015). To date, much research has been devoted to examining the psychometric properties of scores obtained from translations of the original English versions of such instruments in non-English speaking countries from Europe (Mónok et al., 2012; Sauchelli et al., 2016; Sicilia, Alías-García, Ferriz, & Moreno-Murcia, 2013; Zeeck et al., 2017), South America (Alchieri et al., 2015; Sicilia et al., 2017), and Asia (Li, Nie, & Ren, 2016; Shin & You, 2015). However, much less effort has been spent on examining the psychometric properties of these PE scores among specific populations (e.g., in terms of their clinical condition [Formby, Watson, Hilyard, Martin, & Egan, 2014] or the exercise modality practised [Lichtenstein & Jensen, 2016]), as well as whether these properties can be generalized across different countries or languages (Griffiths et al., 2015). This is an important limitation in the case of a psychometric property that, such as reliability (i.e., measurement precision), is highly dependent on both the test application conditions and the characteristics of the sample under consideration (Slaney, 2017). A main practical implication of the extant literature concerns cross-group comparisons, because unequal reliability between groups can lead to wrong conclusions when comparing their respective scores (Graham & Unterschute, 2015). This is a matter of relevance in PE research because sample characteristics (e.g., exercise modality practised or being at-risk of an eating disorder) are frequently used for comparison purposes (Di Lodovico, Poulnais, & Gorwood, 2019; Trott et al., 2020). Having a comprehensive understanding of the effect of the sample and application characteristics on the score reliability of self-report instruments assessing PE is likely to contribute to advancing the science in this field. For example, this knowledge may assist practitioners and researchers in choosing an assessment tool capable of producing reliable scores across a range of circumstances. However, there is no summarized evidence on the reliability of scores derived from self-report instruments assessing PE across populations and application conditions.

Reliability Generalization (RG) meta-analysis provides cumulative evidence on elements contributing to the variability of test score reliability across studies (Vacha-Haase, Kogan, & Thompson, 2000; Vacha-Haase, Henson, & Caruso, 2002). Despite many reliability indices being available (Cho, 2016), it is often the case that RG meta-analysis only presents information concerning Cronbach’s alpha coefficients (e.g., Graham & Unterschute, 2015; Vicent, Rubio-Aparicio, Sánchez-Meca, & Gonzálvez, 2019). This is due to an overwhelming use of alpha in primary studies (Hoekstra, Vugteveen, Warrens, & Kruyen, 2019). However, it has been suggested that this prevalent use of alpha is more due to compliance reasons such as it being perceived as a common and required practice (Hoekstra et al., 2019) rather than to its superiority over other reliability indexes or, as it would be methodologically sound, its adequacy according to the nature of the data (Cho, 2016). Indeed, the fact that alpha functions as an unbiased reliability estimator is dependent on the fulfilment of three main assumptions: (i) the unidimensionality of the test, (ii) the equality of the factor loadings of the items (i.e., tau-equivalence; if not met, alpha will underestimate reliability), and (iii) the independency of the error terms of the items (if not met, alpha will overestimate reliability) (Cho & Kim, 2015).

Based on these considerations, it follows that providing evidence on whether reported alpha values have been obtained after testing the assumptions required for the unbiased use of such a coefficient may be of interest from the perspective of RG meta-analysis. Similar ways of proceeding are common in RG meta-analysis (e.g., Graham & Unterschute, 2015; Vicent et al., 2019) with regard to another questionable reporting practice that may also influence the scope of the results, namely, reliability induction (i.e., the fact of not reporting reliability estimates for the data at hand; Vacha-Haase et al., 2000). Moreover, almost no attention has been paid to date in RG meta-analysis to alpha reporting practices in terms of their application assumptions (Vacha-Haase & Thompson, 2011). In view of these considerations, it is reasonable to suggest that examining both the rate of reliability induction and the extent to which the assumptions underlying the unbiased performance of alpha may lead to a more accurate and comprehensive interpretation of the results provided in RG meta-analysis.

Within this context, the present RG meta-analysis addresses three objectives concerning several widely used instruments proposed in the self-reported assessment of PE. More specifically, these are to (i) estimate the average reliability of the test scores under consideration; (ii) examine the sociodemographic and methodological characteristics that may affect the reliability estimates of the test scores of interest; and (iii) examine the reliability reporting practices of studies employing these instruments. The latter will be done (a) by examining the reliability induction rates; and (b) in view of the very likely possibility that alpha will be the most frequently reported index (Cho, 2016), by examining the extent to which the assumptions for unbiased estimates of such coefficient are tested and met.

Method

The systematic review and meta-analysis was conducted in accordance with the checklist from Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) (Moher, Liberati, Tetzlaff, & Altman, 2009) and was registered on PROSPERO (CRD42021237100) (see Supplementary material A).

Locating studies

Electronic bibliographic databases MEDLINE, PsycINFO, Web of Science, Current Contents Connect, SciELO, and Dissertations & Theses Global were searched for eligible studies from inception to January 30, 2020 (see Supplementary material B for the full search strategy). No geographical or cultural restrictions were applied. Reference lists of all retrieved studies were hand-searched to identify further potentially eligible studies.

The references of the retrieved studies were managed in EndnoteX9. Studies were independently selected by two of the authors in two stages by examining (a) their titles and abstracts, and (b) their full-texts. Disagreements were discussed and resolved on a consensual basis with the assistance of a third author if needed.

Eligibility criteria

The review collated data from studies employing the most widely used self-report instruments for the assessment of symptoms of PE (i.e., exercising to the point of losing the control over such a behaviour, so that it may leads to physical, psychological, or social damage; Szabo et al., 2018). According to the findings from previous reviews conducted in the field of PE (e.g., Alcaraz-Ibáñez, Paterna, Sicilia, & Griffiths, 2020, 2021), the following six key instruments were considered eligible: Commitment to Exercise Scale (CES), that assesses the extent to which (i) individuals’ well-being are influenced by exercising, (ii) adherence to exercise is maintained in the face of adverse conditions, and (iii) exercise regimen interferes with social commitments (Davis, Brewer, & Ratusny, 1993); Compulsive Exercise Test (CET), which assesses the primary factors operating in the maintenance of excessive exercise within the eating disorders domain (Taranis, Touyz, & Meyer, 2011); Exercise Addiction Inventory (EAI), which assesses six common criteria proposed for behavioural addictions (Terry, Szabo, & Griffiths, 2004); Exercise Dependence Questionnaire (EDQ), which assesses elements employed in traditional models of addiction and both psychologically-related and socially-related consequences of exercise behaviour (Ogden, Veale, & Summers, 1997); Exercise Dependence Scale (EDS-21), which assesses seven criteria adapted from substance abuse defined in the Diagnostic and Statistical Manual for Mental Disorders (American Psychiatric Association, 1994) applied to the exercise domain (Downs, Hausenblas, & Nigg, 2004); and Obligatory Exercise Questionnaire (OEQ), which assesses the subjective need to engage in repetitive exercise behaviours (Pasman & Thompson, 1988). The eligibility of these instruments was also supported by the findings derived from a search on Google Scholar performed by the present authors for all the 17 measures previously identified within the field (Sicilia, Paterna, Alcaraz-Ibáñez, & Griffiths, 2021). In particular, these instruments were shown to be the ones with the highest number of citations (see Supplementary material C).

Inclusion criteria

Studies were considered eligible if the following criteria were met: (a) at least one of the following six self-report instrument of PE was used: CES, CET, EAI, EDQ, EDS-21, OEQ; (b) they were written in English, Spanish, French, or Portuguese (the working languages of the review team); and (c) some estimate of reliability was provided (e.g., Cronbach’s alpha [α], intra-class correlation index [ICC], or Pearson’s correlation index [r]).

Exclusion criteria

Studies were excluded on the basis of the following criteria: (a) only composite scores comprising two or more instruments assessing PE were provided so that individual scores were not available; (b) specific items were excluded when obtaining global scores of PE and sub-domains scores were not available; (c) specific items were excluded when obtaining sub-scale scores of PE; (d) the scores of PE were obtained using a partially/completely altered factorial structure from the one originally proposed for the instrument; and (e) studies with less than 30 participants. The first four exclusion criteria were implemented with the aim of fulfilling one of the main assumptions of meta-analytic research (i.e., the application of a similar statistical configuration) (Lipsey & Wilson, 2001). The final exclusion criterion was implemented on the basis of the increased sampling error and variations in the assessment of heterogeneity likely introduced by studies with small sample sizes (Lin, 2018).

Coding procedure

A coding frame was developed taking into account the common features of the studies retrieved in a preliminary search. After being pilot-tested, the coding sheet was used by two of the present authors when extracting the relevant data from the retrieved studies (see Supplementary material D). Disagreements between the reviewers were discussed and resolved on a consensual basis with the assistance of a third author if necessary. The following coding categories were considered: (i) citation and year of publication; (ii) sample size; (iii) exercise modality; (iv) eating disorders (EDs); (v) report of leisure time exercise; (vi) regular exercisers; (vii) region (geographic location); (viii) test version; (ix) type of survey; (x) publication status; (xi) study design; (xii) mean and standard deviation (SD) of test scores; (xiii) mean and SD of age; (xiv) % of Whites; (xv) % of females; and (xvi) PE measure. These coded features were considered for descriptive purposes and – where appropriate – as potential moderator variables (Rosenthal, 1995).

Statistical analysis

Effect size calculations

Cronbach’s alpha (α) was employed as the effect size index. In order to normalize their distributions and stabilize their variances, the reliability coefficients were (α)-to-() transformed by applying the formula proposed by Bonett (2002) before conducting the statistical analyses. In the interest of facilitating interpretation of the results, effect sizes and their 95% confidence intervals (CIs) were subsequently ()-to-(α) transformed (Sánchez-Meca, López-López, & López-Pina, 2013).

Due to the expected heterogeneity between studies in terms of participants’ characteristics, and assuming that variations in the distribution and sampling errors of effect sizes may contribute to explain differences between them, the pooled effect sizes were computed using a random-effect model using an estimation method robust to the normality (i.e., restricted maximum likelihood, REML) (Pigott, 2012). The I 2 statistic was used to assess statistical heterogeneity, with values of 25%, 50%, and 75% indicating low, moderate, and high heterogeneity, respectively (Higgins, Thompson, Deeks, & Altman, 2003). The robustness of the summarized estimates was examined through sensitivity analyses (i.e., by conducting systematic reanalysis while removing studies one at a time). Results from sensitivity analyses (see Supplementary material E) were considered meaningful when corrected estimates were beyond the 95% CI of the original ones.

Consistent with previous RG meta-analyses (Rubio-Aparicio, Badenes-Ribera, Sánchez-Meca, Fabris, & Longobardi, 2020), moderator analyses for categorical and continuous variables were conducted provided that at least 15 effect sizes were available. Meta-regression analyses employed for testing moderator variables were conducted in two stages. Firstly, by employing univariable models (i.e., considering each potential moderator in isolation). Secondly, by employing multivariable models in which all significant moderators identified in the first stage were simultaneously introduced. For a better control of Type I error rate, meta-regressions were conducted using the method proposed by Knapp and Hartung (2003). Given constraints due to available sample size, non-significant categorical predictors were sequentially dropped from the full starting multivariable models in order to obtain the most parsimonious and accurate representation of the data. The tenability of the reduced vs. the full model was judged through a likelihood ratio test (LRT). Explained variance by the moderators was quantified as a percentage and expressed by R 2. Provided that at least 10 effect sizes were available (Page, Higgins, & Sterne, 2019), publication bias was examined by visual inspection of funnel plot symmetry, Egger’s test, and the ‘trim and fill’ procedure (see Supplementary material F). The statistical analyses described in this section were conducted in R using the metafor package.

Results

Selection of studies

A total of 3,852 studies were identified from multiple database searches. The study selection procedure was conducted in two stages. Firstly, the eligibility criteria were applied to the studies considered for full text assessment (see Fig. 1). Secondly, the report of reliability indices was examined. Despite the intention of including data on temporal stability (e.g., Pearson’s correlation), the number of studies reporting this information was too low to meta-analytical techniques to be applied (i.e., EAI, Griffiths, Szabo, & Terry, 2005; Li et al., 2016; EDQ, Kern & Baudin, 2011; EDS-21, Downs et al., 2004; Kern, 2007). As a result of this process, 255 studies that reported reliability in terms of alpha coefficient were included in the RG meta-analysis. The study characteristics and their corresponding effect sizes were grouped according to PE measures. Consequently, 741 effect sizes from 255 studies (N = 254,174) were examined in 27 different meta-analyses (see Table 1).

Fig. 1.
Fig. 1.

PRISMA flow diagram of study selection

Citation: Journal of Behavioral Addictions 2022; 10.1556/2006.2022.00014

Table 1.

Alpha estimates for the scores of instruments assessing problematic exercise

Measure (Subscale)ItemsRangeOriginal

α
Meta-analysis report
k95% CIQI2
LoUp
CES-Likert81–10N.R.100.8720.8530.88947.85681.29
CES-VAS80–1550.770300.8420.8160.864401.83493.60
CET240–50.850, 0.830480.8800.8680.891450.90392.99
CET (Avoidance)80–50.880, 0.880270.9070.8880.923601.45995.98
CET (Weight control)50–50.860, 0.850210.8170.7870.842175.46490.72
CET (Mood improvement)50–50.750, 0.720200.8010.7790.836187.27190.71
CET (Lack of enjoyment)30–50.840, 0.820180.7770.7390.810155.37688.08
CET (Rigidity)30–50.730, 0.820230.7710.7480.79392.04876.36
EAI61–50.840420.7680.7390.7942,258.40597.27
EDQ291–70.843120.8620.8420.87970.10184.26
EDQ (Interference)51–70.81470.7430.6760.79549.77286.57
EDQ (Positive reward)41–70.79560.7890.6880.85775.29194.89
EDQ (Withdrawal)41–70.79970.7720.7190.81535.49882.67
EDQ (Weight control)41–70.78160.7210.6700.76418.92571.44
EDQ (Insight into problem)41–70.75660.6900.6250.74424.95278.19
EDQ (Social reasons)31–70.75560.6150.4890.71053.58788.86
EDQ (Health reasons)31–70.70160.7740.6920.83456.77290.64
EDQ (Stereotyped behaviour)21–70.51660.6700.5610.73625.35881.63
EDS-21211–6N.R.900.9300.9230.9373,906.85797.76
EDS-21 (Tolerance)31–60.780, 0.780430.8570.8400.872673.81093.94
EDS-21 (Withdrawal)31–60.930, 0.900420.8280.8090.845603.76792.86
EDS-21 (Intention effects)31–60.920, 0.890430.8810.8650.895906.01395.48
EDS-21 (Lack of control)31–60.820, 0.820440.8230.8030.841691.37393.80
EDS-21 (Time)31–60.880, 0.860430.8480.8330.862549.97791.82
EDS-21 (Reduction in other activities)31–60.670, 0.750530.7040.6750.730692.15092.53
EDS-21 (Continuance)31–60.890, 0.900430.8340.8160.851611.49993.26
OEQ 20201–40.960380.8700.8530.885556.52794.43

Note. α = alpha value(s) reported in the original validation studies; = Estimated effect size (corrected coefficient alpha); CI = Confidence interval; Lo = Lower; Up = Upper; N.R. = non-reported; CES-VAS = Commitment Exercise Scale; CET = Compulsive Exercise Test; EAI = Exercise Addiction Inventory; EDS-21 = Exercise Dependence Scale-21; OEQ = Obligatory Exercise Questionnaire

Commitment to Exercise Scale

Two different response procedures were employed in the retrieved studies using the CES (i.e., Likert scales or visual analogue scales [VAS]). Given that the homogeneity of statistical configuration across studies is one of the main underlying assumptions of meta-analysis (Lipsey & Wilson, 2001), the scores of the CES (Likert) and CES (VAS) were examined independently.

Commitment to Exercise Scale using Likert scales

The analysis examining alpha estimates for the global score on the CES-Likert (see Forest plot in Supplementary material G) included 10 effect sizes from nine studies involving a total (N total) of 2,891 participants. Results from the random effects model showed a pooled alpha estimate of 0.872 (P < 0.001; 95% CI = 0.853 to 0.889, I 2 = 81.29). Since the number of effect sizes retrieved was <15, moderation analyses were not conducted.

Commitment to Exercise Scale using visual analogue scales

The analysis examining alpha estimates for the global score on the CES-VAS (see Forest plot in Supplementary material G) included 30 effect sizes from 23 studies (N total = 6,529). Results from the random effects model showed a pooled alpha estimate of 0.842 (P < 0.001; 95% CI = 0.816 to 0.864, I 2 = 93.60). Results from the univariate meta-regression analysis for categorical variables (see Table 2) identified the following significant moderators: (a) eating disorders (omnibus-test [2, 27] = 7.451; P = 0.003; R 2 = 33.59); (b) report of leisure time exercise (omnibus-test [1, 28] = 6.096; P = 0.020; R 2 = 16.93); (c) region (omnibus-test [4, 25] = 3.850; P = 0.014; R 2 = 28.21); (d) test version (omnibus-test [1, 28] = 5.621; P = 0.025; R 2 = 13.48); and (e) type of survey (omnibus-test [3, 26] = 3.990; P = 0.018; R 2 = 25.87). Results from the univariate meta-regression analysis for continuous variables (see Table 3) did not identify any significant moderator. Results from the multivariate meta-regression analysis showed that eating disorders, report of leisure time exercise, test version, and type of survey explained together 68.73% of variance in pooled alpha estimate (see Table 4).

Table 2.

Results of univariable meta-regression analyses for categorical variables (global scores)

SubgroupsCES-VASCETEAIEDS-21OEQ
K95% CII2K95% CII2K95% CII2K95% CII2K95% CII2
LoUpLoUpLoUpLoUpLoUp
Exercise modality
Unknown (RC)190.8430.8050.87495.23380.8870.8760.89792.07160.7830.7400.81997.46390.9460.9370.95396.57250.8670.8490.88390.02
Unclear20.8000.7460.84271.7780.8430.7900.88390.3680.7690.7100.81594.69180.9200.8960.93998.6060.8630.8290.89089.39
Power disciplines20.7330.7050.7590.0030.9180.8890.93976.70
Non-endurance10.7700.6420.85210.8500.8050.88520.7080.5230.82185.2740.9170.8380.95798.09
Multiple sports70.8710.8600.85513.5060.7960.6470.88298.4990.9250.9030.94195.8920.9540.9430.96234.26
Fitness and health10.7700.7260.80740.7200.6610.77092.0970.9240.9030.94295.89
Endurance10.8500.8220.87440.7640.5810.86796.99100.9130.8990.92593.5250.8370.8050.86489.47
Eating disorders
Unknown (RC)230.8240.7990.84690.69320.8630.8500.87591.31390.7700.7390.79897.35690.9300.9210.9380.97.90370.8690.8520.88494.61
At-risk20.8740.8010.92191.0440.9630.9450.97478.37
Not at-risk10.9000.8610.92850.9170.8940.93484.06
Mixed60.8570.8270.91190.3330.9020.8530.93484.6230.7450.6950.78580.49110.9220.8960.94298.0010.9000.8630.927
Clinical10.9500.9300.964100.9270.9200.9340.0110.9300.9130.944
Report of LTE
No (RC)130.8080.7660.84292.11310.8870.8750.89992.81200.7860.7400.82498.06420.9310.9200.94198.30230.8680.8490.88490.84
Yes170.8640.8360.88792.50170.8640.8390.88590.77220.7510.7140.78295.19480.9290.9190.93996.95150.8730.8390.90096.96
Regular exercisers
Unknown (RC)210.8380.8030.86795.06400.8860.8740.89692.08280.7840.7510.81397.57550.9340.9250.94297.93100.8830.8660.89893.10
Yes90.8510.8210.87783.4080.8420.7920.88189.36130.7290.6730.77594.91340.9220.9090.93497.24280.8270.8000.85088.61
Region
Unknown (RC)20.8150.7190.87888.1270.9050.8720.92992.1880.7900.7560.81990.96160.9410.9280.95194.85130.8910.8630.91494.59
South America180.8200.7890.84791.2920.6400.5270.72682.6940.8800.8560.90075.04
Oceania70.8900.8610.91384.4520.7040.6490.7500.0010.9300.9110.94570.8540.8040.89294.19
North America70.8750.8320.90790.63120.8640.8410.88487.5650.8370.7950.87187.39290.9380.9270.94797.52150.8580.8330.87993.31
Mixed10.9500.9300.96440.8900.7950.94193.9240.9380.8890.96598.47
Europe20.8340.8030.86267.28180.8750.8610.88789.97240.7450.7060.77996.82340.9200.9050.93297.6930.8550.7870.90185.50
Asia10.9200.9140.92620.9420.5250.99399.67
Test version
Original (RC)110.8740.8340.90493.81440.8770.8650.88892.89210.7950.7650.82195.19580.9360.9280.94497.38380.8700.8530.85594.43
Linguistically adapted190.8200.7910.84681.2840.9050.8570.93785.16210.7390.6870.78297.73320.9180.9020.93097.96
Type of survey
Unknown (RC)150.8710.8440.89391.89360.8840.8710.89692.54250.7860.7510.81697.45460.9290.9190.93897.18240.8630.8480.87890.19
Paper-pencil20.7080.5920.79155.1290.8780.8590.89554.2550.7270.6340.79696.08260.9280.9140.94097.6680.8670.8220.90194.57
On-line120.8200.7800.85291.6630.8610.8220.89394.61110.7480.6760.80396.12160.9300.9050.94898.2640.8810.7880.93396.45
Both10.7700.7140.81510.7100.6690.74620.9680.9230.98797.8120.9200.6870.97998.31
Publication status
Published (RC)240.8300.8000.85593.76410.8810.8680.89394.10380.7590.7280.78597.25790.9310.9230.93897.95300.8700.8490.88895.59
Unpublished60.8820.8460.91085.6570.8700.8490.89970.7840.8430.7950.87989.55110.9200.8960.93994.0180.8700.8480.88980.17
Study design
Psychometric (RC)50.8590.7670.91496.5780.8480.7970.88793.70120.7840.7140.83797.7490.9330.9020.95498.6960.8780.8050.92496.13
Applied250.8380.8110.86192.81400.8850.8730.89591.62300.7610.7300.78996.68810.9300.9220.93797.60320.8680.8520.88393.68

Note. = Corrected coefficient alpha; CI = Confidence interval; Lo = Lower; Up = Upper; RC = Reference category; LTE = Leisure time exercise; CES-VAS = Commitment Exercise Scale; CET = Compulsive Exercise Test; EAI = Exercise Addiction Inventory; EDS-21 = Exercise Dependence Scale-21; OEQ = Obligatory Exercise Questionnaire.

Table 3.

Results of univariable meta-regression analyses for continuous variables (global scores)

ModeratorsCES-VASCETEAIEDS-21OEQ
Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2
Mean of test score290.0000.0010.9710.0040−0.0250.9980.3240.3631−0.2504.9930.03313.0868−0.2434.8950.0306.3733−0.0470.1500.7010.00
SD of test score29−0.0010.0040.9480.0039−0.3012.6900.1095.34310.3981.3040.2632.18660.6185.8360.0196.8833−0.1660.4020.5310.00
Mean age260.0111.3960.2491.4143−0.0080.7850.3810.0040−0.0030.0760.7850.0078−0.0010.0120.9130.0032−0.0050.2330.6330.00
SD age260.0290.7340.4000.00420.0080.2700.6060.00370.0000.0010.9820.0076−0.0070.1890.6660.00310.0040.0340.8550.00
% of Whites140.0010.1380.7170.0017−0.0030.1330.7200.007*−0.0030.2790.6200.0038−0.0032.3790.1324.50180.0030.1550.6990.00
% of Females300.0010.2730.6050.00470.0020.9480.3360.00400.0010.1670.6850.00890.0021.5440.2170.4434−0.0010.1560.6950.00
Year of publication300.0020.0400.8430.00480.0273.8210.0575.9542−0.0050.0910.7650.00900.0080.4420.5080.0038−0.0121.6880.2021.40

Note. β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test; RC = Reference category; CES-VAS = Commitment Exercise Scale; CET = Compulsive Exercise Test; EAI = Exercise Addiction Inventory; EDS-21 = Exercise Dependence Scale-21; OEQ = Obligatory Exercise Questionnaire. Statistically significant effects (P < 0.05) appear highlighted in bold.

*Correspond to K< 10 and should therefore not be interpreted (Fu et al., 2011)

Table 4.

Results of multivariable meta-regression analyses (global scores)

ModeratorsKβ0β1SEFPR2
CES-VAS3051.844<0.00168.73
1.7790.117
Eating disorders (Mixed)0.2810.133
Eating disorders (Clinical)0.9310.298
Report of LTE (Yes)0.2860.117
Test version (linguistically adapted)–0.2680.125
Type of survey (Paper-pencil)–0.4760.222
Type of survey (Online)0.1100.142
Type of survey (Both)–0.5950.267
CET4849.917<0.00157.55
2.0390.043
Eating disorders (At risk)0.0410.163
Eating disorders (Not at risk)0.2630.264
Eating disorders (Mixed)0.2550.147
Eating disorders (Clinical)0.5640.093
Regular exercisers (Yes)–0.2570.094
EAI3138.281<0.00159.22
2.2510.282
Region (South America)–0.3340.168
Region (Oceania)–0.3370.166
Region (North America)0.0230.145
Region (Europe)0.1390.102
Test version (linguistically adapted)–0.2480.091
Mean total score*–0.2230.094
EDS-216637.410<0.00138.02
2.9380.323
Exercise modality (Unclear)–0.3800.137
Exercise modality (Power disciplines)0.4370.287
Exercise modality (Non-endurance)–0.6840.247
Exercise modality (Multiple sports)–0.3820.169
Exercise modality (Fitness and health)–0.6450.214
Exercise modality (Endurance)–0.4880.159
Mean total score*–0.0780.106
SD total score*0.2030.228
OEQ3864.660<0.00168.55
2.0960.050
Exercise modality (Unclear)0.1560.114
Exercise modality (Multiple sports)0.9970.174
Exercise modality (Endurance)0.2950.160
Regular exercisers (Yes)–0.4630.124
Publication status (Unpublished)–0.1970.093

Note. β₀ = intercept/mean effect size; β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test of moderators; CES-VAS = Commitment Exercise Scale; CET = Compulsive Exercise Test; EAI = Exercise Addiction Inventory; EDS-21 = Exercise Dependence Scale-21; OEQ = Obligatory Exercise Questionnaire; LTE = Leisure time exercise. The reference categories were: Unknown (Eating disorders, Exercise modality, and Region), Original version (Test version), and Published (Publication status). Statistically significant effects (P < 0.05) appear highlighted in bold.

* Continuous moderator.

Compulsive Exercise Test

The analysis examining the alpha estimates for the global score on the CET (see Forest plot in Supplementary material G) included 48 effect sizes from 42 studies (N total = 14,675). Results from the random effects model showed a pooled alpha estimate of 0.880 (P < 0.001; 95% CI = 0.868 to 0.891, I 2 = 92.99). Results from the univariate meta-regression analysis for continuous categorical variables (see Table 2) identified the following significant moderators: (a) eating disorders (omnibus-test [4, 43] = 8.737; P < 0.001; R 2 = 43.48); (b) regular exercisers (omnibus-test [1, 46] = 6.482; P = 0.014; R 2 = 11.63); and (c) study design (omnibus-test [1, 46] = 4.723; P = 0.035; R 2 = 7.47). Results from the univariate meta-regression analysis for continuous variables (see Table 3) did not identify any significant moderators. Results from the multivariate meta-regression analysis showed that eating disorders and regular exercisers together explained 57.55% of variance in pooled alpha estimate (see Table 4).

Compulsive Exercise Test subscales

The analysis examining the alpha estimates for the subscale scores on the CET (see Forest plot in Supplementary material G) included 109 effect sizes. Considering the different subscales, the effect sizes available ranged from 18 (lack of exercise enjoyment, N total = 4,302) to 27 (avoidance, N total = 6,888). Findings from the random effects model showed pooled alpha estimates ranging from 0.771 (exercise rigidity; P < 0.001; 95% CI = 0.748 to 0.793, I 2 = 76.36) to 0.907 (avoidance; P < 0.001; 95% CI = 0.888 to 0.923, I 2 = 95.98). Results from the univariate meta-regression analysis for categorical variables (see Table 5) identified the following significant moderators: (a) avoidance: exercise modality (omnibus-test [3, 23] = 3.222, P = 0.041, R 2 = 20.10), eating disorders (omnibus-test [2, 24] = 33.606, P < 0.001, R 2 = 75.04), report of leisure time exercise (omnibus-test [1, 25] = 5.833, P = 0.023, R 2 = 16.40), regular exercisers (omnibus-test [1, 25] = 5.429, P = 0.028, R 2 = 14.24), and test version (omnibus-test [1, 25] = 5.455, P = 0.028, R 2 = 16.21); (b) weight control: (type of survey, omnibus-test [2, 18] = 5.322, P = 0.015, R 2 = 35.20); and (c) exercise rigidity: region (omnibus-test [4, 18] = 4.535, P = 0.010, R 2 = 41.51), and study design (omnibus-test [1, 21] = 5.334, P = 0.031, R 2 = 17.36). The results of the univariate meta-regression analysis for continuous variables (see Table 6) identified the following significant moderators: (a) mean of test score (avoidance and mood improvement); (b) age (avoidance); (c) SD of age (avoidance and mood improvement); (d) year of publication (avoidance and weight control; and percentage of females (weight control and exercise rigidity). However, the results of the multivariate meta-regression analysis (see Table 7) supported the moderating role of the variables under examination just for the following cases: (a) eating disorders and SD of test score (avoidance); (b) percentage of females and year of publication (weight control); (c) SD of test score and SD of age (mood improvement); and (d) region and percentage of females (exercise rigidity). The amount of variance in pooled alpha estimates explained by the retained models in the multivariate meta-regression analyses ranged from 63.26% (weight control) to 86.08% (avoidance).

Table 5.

Results of univariable meta-regression analyses for categorical variables (subscale scores of the Compulsive Exercise Test)

SubgroupsAvoidanceWeight controlMood improvementLack of enjoymentExercise rigidity
K95% CII2K95% CII2K95% CII2K95% CII2K95% CII2
LoUpLoUpLoUpLoUpLoUp
Exercise modality
Unknown (RC)180.9220.9010.93896.19120.7970.7480.83692.92110.8300.7930.86091.25110.7870.7270.83393.27160.7640.7360.78977.44
Unclear60.8570.8270.88068.9060.8640.8460.87922.9060.7960.7280.84685.1860.7580.7310.7830.0160.8000.7560.83662.75
Power disciplines
Non-endurance10.8700.8310.90010.7500.6700.81010.7700.6970.82610.7700.6890.83010.7200.6210.793
Multiple sports20.8900.8430.92488.1420.8180.7980.9360.0020.7360.6700.78967.36
Fitness and health
Endurance
Eating disorders
Unknown (RC)150.8760.8560.89390.17120.8180.7780.85192.22120.8060.7750.83284.76120.7700.7260.80788.23150.7640.7320.79182.02
At risk
Not at risk
Mixed40.8930.8640.91879.7340.8080.7800.83336.9230.7440.6950.78455.3010.7700.6890.83010.7200.6210.793
Clinical80.9530.9470.95944.9650.8280.7300.89091.4650.8490.7690.90190.3250.8000.6980.86787.4270.7980.7670.82534.89
Report of LTE
No (RC)160.9210.9000.93996.07100.8090.7610.84791.3790.8230.7700.86593.7780.7810.7300.83888.71130.7680.7390.79370.21
Yes110.8800.8520.90391.86110.8240.7840.85689.97110.7960.7610.82683.04100.7660.7140.80987.17100.7780.7340.81482.07
Regular exercisers
Unknown (RC)190.9190.8980.93596.59130.8040.7690.83489.48120.8220.7820.85593.01110.7970.7470.83790.97160.7660.7400.79074.34
Yes80.8730.8440.89783.6280.8340.7830.87389.4480.7880.7380.82982.3670.7410.6890.78569.0970.7830.7300.82676.72
Region
Unknown (RC)60.9350.9140.95188.0630.7890.7340.83376.5830.8270.7630.87487.5330.7640.5960.86294.8360.7550.7070.79564.68
South America
Oceania50.9040.8640.93287.2550.8690.8320.89866.0250.7770.7230.82054.5040.7760.7250.81715.3040.8250.7880.8550.00
North America20.8960.8190.94092.2920.7740.6190.86685.9310.8500.8160.87810.7700.7130.81610.8000.7500.840
Mixed20.9320.8790.96288.4620.8580.7900.90476.1820.8600.8380.8790.0020.7850.6890.85168.8620.8420.8140.8650.00
Europe120.8870.8470.91797.5090.7910.7430.83191.9290.8000.7390.84795.1880.8000.7080.83493.13100.7460.7130.77775.82
Asia
Test version
Original (RC)230.8990.8780.91695.75210.8170.7870.84290.72200.8090.7790.83690.71180.7770.7390.81088.08190.7760.7500.80079.31
Linguistically adapted40.9430.9200.96086.8040.7480.6910.79549.57
Type of survey
Unknown (RC)170.9140.8890.93497.04170.8320.8070.85384.49130.8090.7740.84087.15110.7780.7270.82089.17180.7690.7440.79168.54
Paper-pencil50.8950.8400.93193.8310.6200.5360.68930.8540.7280.92296.7340.7700.6440.85294.8630.7460.5930.84291.87
On-line50.8920.8570.91885.8530.7670.6780.83190.9040.7780.7540.80017.0130.7830.7350.82320.8070.7760.83419.73
Both
Publication status
Published (RC)230.9120.8910.92896.45170.8160.7810.84692.43160.8210.7870.84991.73140.7660.7190.80589.12190.7760.7500.79876.70
Unpublished40.8730.8420.89869.7140.8190.7660.85975.0840.7610.7140.80050.1440.8157580.85974.1740.7520.6730.81274.92
Study design
Psychometric (RC)110.9040.8660.93197.23100.8280.7910.85789.21100.8170.7630.85894.1580.7800.7260.82385.3480.8020.7690.83170.24
Applied160.9090.8870.92794.64110.8060.7570.84691.20100.8050.7700.83181.90100.7750.7170.82289.44150.7530.7240.78072.36

Note: = Corrected coefficient alpha. CI = Confidence interval; Lo = Lower; Up = Upper; RC = Reference category. LTE = Leisure time exercise.

Table 6.

Results of univariable meta-regression analyses for continuous variables (subscale scores of the Compulsive Exercise Test)

ModeratorsAvoidanceWeight controlMood improvementLack of enjoymentExercise rigidity
Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2
Mean total scores200.3142.6270.1228.2918−0.2003.4730.08017.2717−0.1590.6960.4170.0015−0.1181.3450.2671.8417−0.0870.8400.3740.00
SD total scores201.38341.712<0.00170.9218−0.4540.7390.4030.00171.91230.996<0.00171.4515−0.0720.0570.8140.00170.3711.6750.2152.51
Mean age270.0384.7480.03914.9121−0.0240.0130.0818.75200.0264.3570.05114.75180.0212.1820.1597.12230.0070.3300.5720.00
SD age270.0719.5480.00527.6921−0.0291.8510.1903.14200.0454.9160.04017.60180.0331.9300.1844.68230.0060.1400.7120.00
% of Whites7*−0.0020.0620.8130.007*0.0132.4770.17623.216*−0.0062.7140.17532.786*0.0020.0780.7940.006*−0.0062.1880.21361.24
% of Females270.0031.0490.3160.12210.0068.1540.01029.83200.0031.2020.2872.56180.0010.1500.7030.00230.0048.8070.00738.16
Year of publication270.07110.6940.00328.7521−0.0466.2180.02227.52200.0261.5990.2223.60180.0010.0010.9710.0023−0.0110.6540.4280.00

Note. β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test of moderators; Statistically significant effects (P < 0.05) appear highlighted in bold.

* Correspond to K< 10 and should therefore not be interpreted (Fu et al., 2011).

Table 7.

Results of multivariable meta-regression analyses (subscale scores of the Compulsive Exercise Test)

ModeratorsKβ0β1SEFPR2
Avoidance2726.516<0.00186.08
1.3000.263
Eating disorders (Mixed)−0.0200.132
Eating disorders (Clinical)0.6150.182
SD total score*0.8060.245
Weight control219.3350.00263.26
2.4180.436
% of Females*0.0050.002
Year of publication*−0.0420.015
Mood improvement2020.014<0.00181.45
−0.3250.340
SD total score*1.7770.321
SD age*0.02640.013
Exercise rigidity235.4270.00473.70
1.1440.132
Region (Oceania)0.2890.135
Region (North America)0.2280.172
Region (Mixed)0.4070.139
Region (Europe)0.0300.090
% of Females*0.0030.001

Note. β₀ = intercept/mean effect size; β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test of moderators. Unknown was considered as the reference category both for Eating disorders and Region. Statistically significant effects (P < 0.05) appear highlighted in bold.

* Continuous moderator.

Exercise Addiction Inventory

The retrieved studies included multiple versions of the EAI. Since only one study reported alpha scores for the EAI-R (Szabo, Pinto, Griffiths, Kovácsik, & Demetrovics, 2019) (α = 0.90), this was excluded from the analyses. The analysis examining the alpha estimates for the global score on the EAI (see Forest plot in Supplementary material G) included 42 effect sizes from 40 studies (N total = 26,565). Results from the random effects model showed a pooled alpha estimate of 0.768 (P < 0.001; 95% CI = 0.739 to 0.810, I 2 = 97.27). Results from the univariate meta-regression analysis for categorical variables (see Table 2) identified the following significant moderators: (a) region (omnibus-test [5, 36] = 5.182; P = 0.001; R 2 = 35.78); (b) test version (omnibus-test [1, 40] = 4.264; P = 0.046; R 2 = 7.46); and (c) publication status (omnibus-test [1, 40] = 4.720; P = 0.036; R 2 = 8.50). Results from the univariate meta-regression analysis for continuous variables (see Table 3) identified the mean of test score as a significant moderator. Results from the multivariate meta-regression analysis (see Table 4) showed that region, test version, and mean of test score together explained 59.22% of variance in pooled alpha estimate.

Exercise Dependence Questionnaire

The analysis examining the alpha estimates for the global score on the EDQ (see Forest plot in Supplementary material G) included 12 effect sizes from 11 studies (N total = 2,961). Results from the random effects model showed a pooled alpha estimate of 0.862 (P < 0.001; 95% CI = 0.842 to 0.879, I 2 = 84.26). Since the number of effect sizes available was <15, moderation analyses were not performed.

Exercise Dependence Questionnaire subscales

The analyses examining the alpha estimates for the subscale scores on the EDQ (see Forest plot in Supplementary material G) included 50 single alpha scores. The effect sizes available ranged from six (positive reward, N total = 1,405) to seven (interference, N total = 1,498). Findings from the random effects model showed pooled alpha estimates ranging from 0.615 (social reasons; P < 0.001; 95% CI = 0.489 to 0.710, I 2 = 88.86) to 0.789 (positive reward; P < 0.001; 95% CI = 0.688 to 0.857, I 2 = 94.89). Since the number of effect sizes available was <15, moderation analyses were not performed.

Exercise Dependence Scale-21

The analysis examining the reliability estimates for the global score on the EDS-21 (see Forest plot in Supplementary material G) included 90 effect sizes from 84 studies (N total  =  35,918). Results from the random effects model showed a pooled alpha estimate of 0.930 (P < 0.001; 95% CI = 0.923 to 0.937, I 2 = 97.96). Results from the univariate meta-regression analysis for categorical variables (see Table 2) identified both exercise modality (omnibus-test [6, 83] =  4.100; P = 0.001; R 2 = 18.00) and test version (omnibus-test [1, 88] = 5.930; P = 0.017; R 2 = 5.24) as significant moderators. Results from the univariate meta-regression analysis for continuous variables (see Table 3) identified both mean test score and SD of test score as significant moderators. Results from the multivariate meta-regression analysis showed that exercise modality, test version, and mean test score and SD of these scores together explained 38.02% of variance in pooled alpha estimates (see Table 4).

Exercise Dependence Scale-21 subscales

The analyses examining the reliability estimates for the subscale scores on the EDS-21 (see Forest plot in Supplementary material G) included a total of 311 effect sizes. The effect sizes available ranged from 42 (withdrawal, N total = 15,457) to 53 (reduction in other activities, N total = 18,755). Findings from the random effects model showed pooled alpha estimates ranging from 0.704 (reduction in other activities; P < 0.001; 95% CI = 0.675 to 0.730, I 2 = 92.53) to 0.881 (intention effects; P < 0.001; 95% CI = 0.865 to 0.895, I 2 = 95.48). Results from the univariate meta-regression analysis for categorical variables (see Table 8) identified the following significant moderators: (a) tolerance: region (omnibus-test [5, 37] = 4.528, P = 0.003, R 2 = 31.52), test version (omnibus-test [1, 41] = 6.763, P = 0.013, R 2 = 13.49), and publication status (omnibus-test [1, 41] = 4.440, P = 0.041, R 2 = 8.69); (b) withdrawal: region (omnibus-test [5, 36] = 10.317, P < 0.001, R 2 = 61.22), and test version (omnibus-test [1, 40] = 18.992, P < 0.001, R 2 = 34.95); (c) intention: report of leisure time (omnibus-test [1, 41] = 4.465, P = 0.041, R 2 = 7.92), regular exercisers (omnibus-test [1, 41] = 5.434, P = 0.025, R 2 = 10.36), region (omnibus-test [5, 37] = 10.661, P < 0.001, R 2 = 55.86), test version (omnibus-test [1, 41] = 28.574, P < 0.001, R 2 = 42.29), and publication status (omnibus-test [1, 41] = 8.651, P = 0.005, R 2 = 16.05); (d) lack of control: region (omnibus-test [5, 37] = 10.661, P < 0.001, R 2 = 54.87), test version (omnibus-test [1, 42] = 28.574, P < 0.001, R 2 = 42.99), publication status (omnibus-test [1, 42] = 4.475, P = 0.040, R 2 = 8.40), and study design (omnibus-test [1, 42] = 5.792, P = 0.021, R 2 = 9.99); (e) time: region (omnibus-test [5, 37] = 5.849, P < 0.001, R 2 = 41.55), and test version (omnibus-test [1, 41] = 7.396, P = 0.010, R 2 = 15.06); (f) continuance: region (omnibus-test [5, 37] = 6.759, P < 0.001, R 2 = 45.41), and test version (omnibus-test [1, 41] = 7.716, P = 0.008, R 2 = 15.95). The results of the univariate meta-regression analysis for continuous variables (see Table 9) identified of the following significant moderators: (a) test mean score (lack of control); (b) SD of test score (tolerance); and (c) percentage of females (tolerance, intention effects, lack of control, time, and continuance). The results of the multivariate meta-regression analysis (see Table 10) supported the moderating role of the following variables: (a) SD of test scores and percentage of females, (tolerance); (b) region and percentage of females (intention effects); (c) region and percentage of females (lack of control); (d) test version and percentage of females (Time); and (e) region, test version, and percentage of females (continuance). The amount of variance in pooled alpha estimates explained by the retained models the multivariate meta-regression analyses ranged from 27.97% (tolerance) to 67.73% (intention effects).

Table 8.

Results of univariable meta-regression analyses for categorical variables (subscale scores of the Exercise Dependence Scale-21)

SubgroupsToleranceWithdrawalIntention effectsLack of controlTimeReduction in other activitiesContinuance
K95% CII2K95% CII2K95% CII2K95% CII2K95% CII2K95% CII2K95% CII2
LoUpLoUpLoUpLoUpLoUpLoUpLoUp
Exercise modality
Unknown (RC)80.8920.8590.91791.3480.8380.7930.87490.2790.9090.8770.93394.4290.8290.7620.87895.8580.8490.8110.80088.67130.7200.6390.78293.6790.8110.7480.85894.43
Unclear180.8490.8230.87093.48170.8050.7760.82990.77170.8720.8450.89495.32170.8240.7890.85394.99180.8540.8250.87895.41180.7070.6670.74190.22170.8380.8070.86394.42
Power disciplines20.7840.6900.84969.7020.8350.7990.8650.0010.8900.8540.81720.7650.7140.8070.0020.8050.7630.8400.0030.7620.7180.7997.6120.8440.6930.92191.47
Non-endurance20.8220.7910.8480.0020.8030.7600.83834.2820.8080.7750.8360.0020.8390.7550.89585.7420.8340.8060.8590.0020.6060.4960.69258.1020.7900.7540.8210.00
Multiple sports60.8530.7980.89294.8660.8300.7790.86992.5960.8810.8330.91595.5660.8110.7500.8570.93.4160.8440.8050.87589.7960.7490.6460.82295.7560.8430.8170.86576.98
Fitness and health40.8360.7510.89296.3840.8690.7640.92798.1740.8840.8430.91593.2640.8360.8020.86481.9330.8680.8380.89383.7150.7030.6170.76988.7140.8760.8300.90993.58
Endurance30.8910.8590.91573.5730.8650.8300.89267.6240.8710.7740.92697.3040.8130.7610.85585.3940.8250.8060.84323.8560.6140.5510.66777.3830.8060.7400.85580.53
Eating disorders
Unknown (RC)410.8580.8410.87494.13400.8310.8110.84893.02400.8820.8650.89795.43420.8230.8020.84294.11410.8490.8340.86392.11480.7060.6760.73492.79410.8370.8190.85493.14
At risk
Not at risk10.8200.7880.84710.7700.7290.80510.8500.8230.87310.8400.8110.86410.8000.7640.83110.6800.6220.72910.7200.7000.763
Mixed10.8100.7590.85110.7800.7210.82720.8710.6680.95097.9210.8000.7460.84310.8400.7970.87420.6430.4510.76889.9810.7900.7330.835
Clinical
Report of LTE
No (RC)120.8740.8420.90096.27110.8450.8160.86992.42120.9030.8700.92897.62130.8410.7990.87496.77130.8630.8410.88392.28190.7010.6360.75495.87120.8290.7920.86094.96
Yes310.8490.8290.86691.93310.8220.7970.84392.45310.8710.8540.88692.65310.8140.7920.83490.68300.8400.8210.85890.79340.7050.6740.73388.82310.8360.8140.85592.36
Regular exercisers
Unknown (RC)160.8730.8470.89595.26150.8400.8150.86190.91170.9000.8750.92096.73170.8390.8060.86695.40170.8640.8460.88089.72257.020.6500.74695.13160.8310.8020.85593.08
Yes270.8460.8240.86592.15270.8210.7940.84593.22260.8660.8470.88492.68270.8120.7860.83491.82260.8360.8130.85691.64280.7060.6730.73688.11270.8360.8110.85893.27
Region
Unknown (RC)60.8810.8460.90787.4560.8540.8240.87976.9670.9090.8800.93191.1680.8470.8070.87991.4170.8660.8380.88983.77130.7260.6340.79595.5670.8650.8390.88679.85
South America40.7800.7370.81667.2140.7480.6460.82090.9430.8380.7900.87582.5440.7540.7120.79159.4740.7790.7210.82479.9550.7430.6390.81791.9440.8340.7720.87889.22
Oceania10.9200.9030.93410.8900.8660.91010.9300.9150.94310.9200.9030.93410.9400.9270.95110.7600.7080.80310.9300.9150.943
North America80.8910.8540.91895.5280.8850.8600.90690.0890.9240.9120.93585.8080.8620.8320.88790.2880.8700.8450.89187.76100.6740.6250.71784.5180.8710.8470.89286.75
Mixed
Europe220.8470.8270.86490.65210.8090.7950.82372.0890.8450.8230.86491.68210.7970.7660.82392.74220.8380.8200.85487.73220.6880.6480.72390.90210.7960.7700.81990.36
Asia20.8070.7520.85060.9320.7490.7070.7860.0020.8860.8660.9020.0020.8320.7620.88279.5910.8400.8020.87120.7410.6970.7790.0020.8410.8140.8640.00
Test version
Original (RC)180.8780.8530.89994.05180.8630.8400.88290.84190.9120.8960.92693.34200.8490.8240.87192.59190.8680.8470.88590.64250.7120.6690.74992.59190.8580.8300.88193.73
Linguistically adapted250.8390.8190.85791.33240.7980.7770.81686.27240.8490.8300.86691.18240.7970.7690.82192.47240.8310.8110.84990.09280.6970.6560.73292.43240.8120.7910.83188.98
Type of survey
Unknown (RC)180.8590.8310.88295.00240.8360.8090.85994.41120.8960.8660.91995.59180.8070.7690.83994.61220.8380.8170.85691.16320.7020.6580.74093.58220.8510.8310.86991.33
Paper-pencil150.8630.8350.88693.4690.8070.7640.84290.27170.8860.8620.90595.44110.8300.7880.86494.33110.8560.8280.88090.0690.6900.6160.74993.5080.8080.7510.85294.16
On-line70.8130.7750.84586.2080.8230.7840.85586.25120.8620.8220.89395.65120.8230.8010.84277.6790.8580.8160.89093.50110.7070.6760.73569.15110.8190.7710.85794.49
Both30.8960.8630.92170.0510.8500.8090.88220.8420.7950.8770.0030.8780.7790.93394.3510.9000.8590.92910.8400.7960.87520.8180.7040.88881.46
Publication status
Published (RC)400.8520.8350.86893.56390.8250.8040.84393.02390.8740.8570.88994.93410.8170.7970.83693.13400.8490.8330.86392.03490.7070.6760.73492.83400.8330.8130.85093.57
Unpublished30.9060.8630.93688.8930.8760.8540.89438.5940.9310.9130.94683.5930.8820.8010.93194.2630.8370.7510.89391.0740.6690.5780.74184.9930.8550.7980.89584.94
Study design
Psychometric (RC)150.8430.8120.86995.02150.8150.7710.85296.66160.8730.8460.89495.73150.7890.7480.82495.04150.8380.8100.86394.11160.7120.6640.75493.49150.8320.7970.86095.34
Applied280.8640.8430.88292.78270.8350.8160.85286.96270.8860.8640.90395.07290.8380.8170.85791.50280.8530.8350.86989.90370.7000.6630.73391.87280.8350.8120.85591.60

Note. = Corrected coefficient alpha. CI = Confidence interval; Lo = Lower; Up = Upper; RC = Reference category. LTE = Leisure time exercise.

Table 9.

Results of univariable meta-regression analyses for continuous variables (subscale scores of the Exercise Dependence Scale-21)

ModeratorsToleranceWithdrawalIntention effectsLack of controlTimeReduction in other activitiesContinuance
Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2Kβ1FPR2
Mean total scores36−0.1833.5730.0678.0236−0.0350.3120.5809.5937−0.1701.2660.2681.1938−0.2948.7450.00619.1736−0.1292.6290.1145.4441−0.0960.7480.3930.00370.0600.2560.6160.00
SD total scores360.6234.5240.0419.2236−0.0600.0540.8189.58370.0820.0410.8400.0038−0.0750.0970.7580.00360.2100.5390.4680.0041−0.0080.0020.9670.0037−0.4582.3190.1374.02
Mean age41−0.0010.0030.9570.0040−0.0030.1790.67514.6440−0.0040.1340.7160.00420.0030.1400.7100.0041−0.0030.1980.6590.00490.0060.5190.4750.0041−0.0112.2530.1413.84
SD age40−0.0100.4300.5160.0039−0.0191.6160.21214.3239−0.0010.0020.9660.0041−0.0040.0550.8150.0040−0.0010.0090.9270.00480.0222.3420.1332.3440−0.0140.9860.3270.87
% of Whites8*0.0102.6380.15619.418*0.0083.7080.1039.21100.0030.4010.5440.009*0.0010.0440.8410.007*−0.0070.4940.5130.00120.0020.1170.7400.009*0.0010.1120.7480.00
% of Females400.0054.2560.0468.58390.0032.2420.14313.88400.0065.4200.02511.17410.00712.3420.00124.97400.00817.577<0.00132.19500.0020.6460.4260.00400.0056.0180.01912.29
Year of publication430.0050.1260.7250.0042−0.0222.7400.10612.9943−0.0120.5590.4590.0044−0.0040.0650.8000.0043−0.0030.0410.8420.0053−0.0010.0120.9130.0043−0.0070.2580.6140.00

Note. β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test of moderators; Statistically significant effects (P < 0.05) appear highlighted in bold.

Table 10.

Results of multivariable meta-regression analyses (subscale scores of the Exercise Dependence Scale-21)

ModeratorsKβ0β1SEFPR2
Tolerance435.5910.00827.97
0.8250.387
SD total scores*0.6970.277
% of Females*0.0060.002
Withdrawal4210.550<0.00167.73
1.9250.099
Region (South America)−0.5690.154
Region (Oceania)0.2830.251
Region (North America)0.2430.128
Region (Europe)−0.2700.111
Region (Asia)−0.5390.196
Intention effects439.240<0.00169.91
2.5960.188
Report of LTE (Yes)−0.3060.107
Region (South America)−0.3390.217
Region (Oceania)0.4140.322
Region (North America)0.2160.139
Region (Europe)−0.4820.123
Region (Asia)−0.0900.241
% of Females*−0.0000.002
Lack of control444.5920.00247.07
1.6610.146
Region (South America)−0.4400.205
Region (Oceania)0.3750.337
Region (North America)0.0320.152
Region (Europe)−0.2630.126
Region (Asia)−0.2640.250
% of Females*0.0050.002
Time4314.198<0.00147.48
1.6830.100
Test version (Linguistically adapted)−0.2180.078
% of Females*0.0070.002
Continuance436.847<0.00165.81
2.0040.148
Region (South America)−0.5670.257
Region (Oceania)0.6650.290
Region (North America)0.0570.133
Region (Europe)−0.9550.248
Region (Asia)−0.7700.292
Test version (Linguistically adapted)0.6000.226
% of Females*−0.0000.002

Note. β₀ = intercept/mean effect size; β₁ = estimated regression coefficient; R 2 = Explained variance; F = Omnibus test of moderators; LTE = Leisure time exercise. The reference categories were: No (Report of LTE), Unknown (Region), and Original version (Test version). Statistically significant effects (P < 0.05) appear highlighted in bold.

* Continuous moderator.

Obligatory Exercise Questionnaire

The analysis examining the reliability estimates for the global score on the OEQ (see Forest plot in Supplementary material G) included 38 effect sizes from 33 primary studies (N total = 10,548). Results from the random effects model showed a pooled alpha estimate of 0.870 (P < 0.001; 95% CI = 0.853 to 0.885, I 2 = 84.43). Results from the univariate meta-regression analysis for categorical variables (see Table 2) identified both exercise modality (omnibus test [3, 34] = 9.568; P < 0.001; R 2 = 43.48) and (b) regular exercisers (omnibus-test [1, 36] = 10.087; P = 0.003; R 2 = 22.55) as significant moderators. Results from the univariate meta-regression analysis for continuous variables (see Table 3) did not identify any significant moderators. Results from the multivariate meta-regression analysis showed that exercise modality and regular exercisers together explained 68.55% of variance in pooled alpha estimates (see Table 4).

Reliability reporting practices

A total of 118 studies reported induced reliability (e.g., based on other studies), eleven studies reported unusable reliability indices (i.e., reliability ranges), and eight studies did not report alpha or Pearson’s correlation but other reliability indices (i.e., ω, Meule et al., 2021; ρ, Alcaraz-Ibáñez, Aguilar-Parra, & Álvarez-Hernández, 2018; Sicilia, Alcaraz-Ibáñez, Lirola, Burgueño, & Maher, 2018; ave, Egan et al., 2017; or ICC, Parastatidou, Doganis, Theodorakis, & Vlachopoulos, 2012; Sicilia et al., 2013, 2017; Sicilia & González-Cutre, 2011). A global reliability induction rate of 47.58% was found. This ranged from 18.64% to 57.14% in the case of the global scores and from 14.93% to 66.67% in the case of subscale scores (see Table 11).

Table 11.

Reliability reporting practices of in studies using self-report instruments assessing problematic exercise

Measure (Subscale)Induced reliabilityReported reliability
By omissionVague reportPrecise reportInduction rateUnusableUsable
K (%)K (%)K (%)%K (%)K (%)
CES-Likert5 (31.25)31.251 (6.25)10 (62.50)
CES-VAS14 (27.45)2 (3.92)5 (9.80)41.1830 (58.82)
CET7 (11.86)3 (5.08)1 (1.69)18.6448 (81.36)
CET (Avoidance)5 (13.16)4 (10.53)1 (2.63)26.321 (2.63)27 (71.05)
CET (Weight control)5 (16.13)4 (12.90)29.031 (3.23)21 (67.74)
CET (Mood improvement)5 (16.67)4 (13.33)30.001 (3.33)20 (66.67)
CET (Lack of enjoyment)5 (18.52)4 (14.81)33.3318 (66.67)
CET (Rigidity)5 (15.15)4 (12.12)1 (3.03)30.3023 (69.70)
EAI26 (26.80)9 (9.28)17 (17.53)53.612 (2.06)43 (44.33)
EDQ3 (10.71)5 (17.86)8 (28.57)57.1412 (42.86)
EDQ (Interference)1 (5.56)5 (27.78)5 (27.78)61.117 (38.89)
EDQ (Positive reward)1 (5.88)5 (29.41)5 (29.41)64.716 (35.29)
EDQ (Withdrawal)1 (5.56)5 (27.78)5 (27.78)61.117 (38.89)
EDQ (Weight control)2 (11.11)5 (27.78)5 (27.78)66.676 (33.33)
EDQ (Insight into problem)1 (5.88)5 (29.41)5 (29.41)64.716 (35.29)
EDQ (Social reasons)2 (11.11)5 (27.78)5 (27.78)66.676 (33.33)
EDQ (Health reasons)2 (11.11)5 (27.78)5 (27.78)66.676 (33.33)
EDQ (Stereotyped behaviour)1 (5.88)5 (29.41)5 (29.41)64.716 (35.29)
EDS-218 (6.30)15 (11.81)6 (4.72)22.838 (6.30)90 (70.87)
EDS-21 (Tolerance)1 (1.75)9 (15.79)17.544 (7.02)43 (75.44)
EDS-21 (Withdrawal)1 (1.79)9 (16.07)17.864 (7.14)42 (75.00)
EDS-21 (Intention effects)1 (1.75)9 (15.79)17.544 (7.02)43 (75.44)
EDS-21 (Lack of control)1 (1.72)9 (15.52)17.244 (6.90)44 (75.86)
EDS-21 (Time)1 (1.75)9 (15.79)17.544 (7.02)43 (75.44)
EDS-21 (Reduction in other activities)1 (1.49)9 (13.43)14.934 (5.97)53 (79.10)
EDS-21 (Continuance)1 (1.75)9 (15.79)17.544 (7.02)43 (75.44)
OEQ7 (10.00)5 (7.14)19 (27.14)44.291 (1.43)38 (54.29)
Total113 (9.77)162 (14.00)98 (8.47)47.5843 (3.72)741 (64.04)

Note. CES-VAS = Commitment Exercise Scale; CET = Compulsive Exercise Test; EAI = Exercise Addiction Inventory; EDS-21 = Exercise Dependence Scale-21; OEQ = Obligatory Exercise Questionnaire; Induced reliability = No reliability values for the data at hand are provided; By omission = No reference to reliability is made; Vague = Some reference to reliability is made, but information concerning the source of such information is missing; Precise report = Reported reliability values correspond to those provided in another studies; Unusable = Reliability values for the data at hand is provided employing indices different to alpha; Usable = Data that were effectively included in the meta-analysis.

Concerning the assumptions required for the unbiased performance of alpha, the first one (i.e., the unidimensionality of the test) was in no case used as an argument to justify the employment of alpha against other reliability indices. Despite the theoretically multidimensional nature of three of the instruments under consideration (CET, EDQ, EDS-21), alpha was frequently used as the reliability index of their global scores (see Table 1). The second assumption (the equality of the factor loadings of the items) was not examined in any of the retrieved studies. The third assumption (i.e., the independency of the error terms), was found to be tested just in the context of improving model fit (e.g.; Zeeck et al., 2017) but in no case to justify the use of alpha or to comment on the implications of using it in such circumstances.

Discussion

The present RG meta-analysis provides summarized evidence on the reliability scores in terms of coefficient alpha of six of the most commonly used self-report instruments assessing PE. Data retrieved from 255 studies (741 independent samples) showed alpha values that ranged from 0.768 to 0.930 for global scores and from 0.615 to 0.907 for subscale scores. The alpha estimates of both global and subscales test scores were affected by several sociodemographic and methodological characteristics. The main implications of these findings are discussed in detail below.

Alpha estimates for total and subscale scores

Interpretation of alpha values has generally been carried out adopting a more is better and cut-off-based approach. This implies that the level of reliability of the scores of a given instrument in terms of alpha would dictate the use for which it may be recommended (Cicchetti, 1994; Nunnally & Bernstein, 1994). According to this approach, the alpha estimates found for the global scores of the instruments under consideration may lead to judging them as suitable for (a) exploratory research (EAI), (b) basic research purposes (CES, CET, EDQ, and OEQ), and (c) applied research and clinical practice (EDS-21). In the case of the subscale scores, applying this same criterion implies considering them as (a) unacceptable for research purposes (insight into problem, social reasons, and stereotyped subscales of the EDQ), (b) acceptable for exploratory research (lack of control and rigidity subscales of the CET; interference, positive reward, withdrawal, weight control, and health reasons subscales of the EDQ; and reduction in other activities subscale of the EDS), (c) suitable for basic research purposes (weight control and mood subscales of the CET; and tolerance, withdrawal, intention effects, lack of control, time, and continuance subscales of the EDS-21), and (d) suitable for applied research and clinical practice (avoidance subscale of the CET). However, the automatic application of cut-off points inherent to this purely quantitative approach of interpreting alpha has been strongly criticised by arguing that they do not emerge as a result of empirical evidence but from researchers’ intuition (Cho & Kim, 2015; Hoekstra et al., 2019; Panayides, 2013). Alternatively, it has been suggested that alpha values should be interpreted also taking into account both instrument length and complexity of the construct being assessed (Cho & Kim, 2015). The implications derived from the latter are discussed separately below for the scores with particularly high or low alpha values.

The fact that high alpha values were obtained for some of the scores under consideration (i.e., those near to 0.90 and above) may not necessarily indicate that these are highly reliable. Indeed, high alpha values may also be due to redundancy in the content of the items, particularly, the greater the number of items used (Cho & Kim, 2015). This redundancy is nevertheless undesirable since it could compromise coverage of the construct being assessed. Moreover, the greater its theoretical complexity, the more potentially relevant content is excluded (Hoekstra et al., 2019; Panayides, 2013). Such redundancy may also imply leaving a considerable proportion of individuals’ estimates outside the items targeting range, which could result in a decreased reliability (Cho & Kim, 2015; Panayides, 2013). Furthermore, it is worth noting that the instruments whose scores were found to have particularly high alpha values do not appear to have been developed with particular attention to their content validity (e.g., almost none of those studies reported that content validity had been evaluated by a panel of experts). Indeed, it was only in the case of a preliminary version of the EDS-21 that the latter was somewhat indicated, although just in terms of “appropriateness” and providing no other further details on the procedure being followed (Hausenblas & Downs, 2002). Additionally, none of the validation studies reported having examined an aspect of content validity, such as comprehensiveness (i.e., no key aspects of the construct are missed), that is particularly relevant in avoiding content redundancy (Mokkink et al., 2010). Consequently, further research is needed that provide evidence on whether the particularly high alpha values obtained in the present study are due to the true high reliability scores or content validity-related shortcomings.

A second important consideration regarding scores that showed the highest levels of alpha concerns the CET, EDS, and EDQ. More specifically, none of these three scales were proposed as being either unidimensional or higher-order instruments (i.e., including a number of first-order factors and one second-order factor). Indeed, evidence exists supporting the multidimensional versus the unidimensional nature of these instruments (Formby et al., 2014; Sicilia & González-Cutre, 2011). It is therefore surprising to find these instrument scores (and their reliability in terms of alpha) have more often been computed on an aggregate basis than a factor-by-factor basis. This is particularly concerning considering that, in instruments with correlated factors, the use of alpha should be limited to such subscale scores, so that in no case should it be used for the overall test score (Cho, 2016; Cho & Kim, 2015). This leads to a suggestion that, should the overall score of any of the instruments under examination be defensible from a theoretical perspective, reliability should be estimated by adopting methodologically sounder alternatives than alpha (see Cho, 2016; Cho & Kim, 2015; Gignac, 2014).

A first point to note with regard to the instruments whose scores showed the lowest alpha estimates concerns the one whose global score showed the lowest alpha estimate among those examined (i.e., the EAI). One explanation for this finding may be that this instrument was developed on six specific theoretical components of behavioural addictions, therefore just one item per component were proposed (Terry et al., 2004). However, the complex nature of some of these components may not be totally represented by a single item without resorting to the use of complex or double-barrelled items (e.g., the item alluding to the conflicts arising between individuals and their “family and/or partner” because of the amount of exercise being engaged in). Such items may be subject to heterogeneous interpretation and, by extension, to contribute to a lesser extent that those more clearly conceptualizing the underlying latent construct (Hayes & Coutts, 2020; Kyriazos & Stalikas, 2018). The latter implies not fulfilling the tau-equivalence assumption for unbiased estimations of alpha, so that this coefficient no longer reflects the true actual reliability of the score but rather its lower bound (Hayes & Coutts, 2020). Consequently, the possibility exists that the EAI’s reliability score was above the one calculated by the analysis in the present study. However, the lack of formal testing of the tau-equivalence assumption of the EAI’s items detected in the retrieved studies prevents us from providing empirical evidence that support this possibility, the collection of which should be subject of future research.

A second point to be noted is that with regard to the instruments whose scores showed the lowest alpha estimates concerns the three subscale scores of the EDQ showing alpha values below the minimum 0.70 cut-off traditionally employed for discouraging the employment of a given score (i.e., insight into problem, social reasons, and stereotyped behaviour). These findings are not entirely surprising considering the difficulty of achieving high alpha values using only a few items in the subscales (i.e., from two to four) (Greco, O’Boyle, Cockburn, & Yuan, 2018). However, it is worth noting that, despite using a similarly small number of items, the scores on some of the other subscales examined (e.g., those of the EDS-21) showed higher levels of alpha than the three aforementioned EDQ subscales. The explanation for these differences is probably due to the way in which the content of the two instruments were developed. That is, on the basis of the theoretical definition of the seven constructs being assessed (in the EDS-21), or by assigning the statements provided by exercisers concerning their exercise-related feelings and cognitions to the factors emerging from statistical analyses (in the EDQ). Therefore, the fact that the items included in these three subscales of the EDQ with particularly low alpha values did not derive from a predetermined theoretical approach could have meant grouping indicators that do not reflect an unequivocal underlying factor, leading to decreased measurement reliability. This is important because low reliability tends to attenuate the strength of the relationship being examined (Graham & Unterschute, 2015). Consequently, these findings raise the need to review the content and number of items included in these subscales in order to improve their reliability.

Moderators of the reliability scores of self-report instruments of PE

Evidence supported the relationship between some of the characteristics of the studies evaluated and the variability in alpha estimates. For example, higher alpha values were found for the global scores of the CES-VAS and the avoidance and rule-driven behaviour subscale of the CET among clinical populations in terms of eating disorders. These findings are relatively unsurprising given that both instruments include content of particular relevance to individuals with eating disorders such as the negative consequences of being unable to exercise, especially feelings of guilt (Davis et al., 1993; Scharmer et al., 2020; Taranis et al., 2011; Zeeck et al., 2017). It follows that comparing scores derived from these two instruments involving individuals with and without a clinical eating disorder diagnosis may be susceptible to bias.

Findings also suggested that the alpha values of the global scores of the CET and the OEQ may be lower among populations comprising regular exercisers. Moreover, it should be noted that the CET was developed with a particular focus on excessive exercise within the eating disorders domain. Therefore, the possibility exists that some of the content included in the instrument (e.g., exercising due to weight/appearance reasons or to the lack of enjoyment when exercising; Taranis et al., 2011) may not be equally relevant for non-clinical populations in terms of eating disorders (Alcaraz-Ibáñez, Sicilia, Dumitru, Paterna, & Griffiths, 2019). Additionally, the lower alpha values obtained for OEQ scores among regular exercisers may be due to the low potential variability of some of the instrument’s items among those featuring very low levels of exercise. Clear examples are items referring to exercise frequency (e.g., exercising on a daily basis) or specific exercise-related habits (e.g., keeping a record of exercise performance) (Pasman & Thompson, 1988). Taken together, these results reinforce the notion that differences in the interpretation of the content of self-report instruments assessing PE may exist among individuals with unequal levels of exercise involvement (Szabo et al., 2015).

Exercise modality is another exercise-related feature that support the likely relationship in alpha estimate variability (i.e., the global scores of the EDS-21). In particular, results suggested that alpha values were lower in studies reporting very precise exercise modalities compared to those that did not. However, the fact that the instrument scores under consideration were found to be similarly reliable in terms of alpha values suggests that comparisons across modalities could be reasonably made. This is important given that this kind of comparison has been a matter of research interest (Di Lodovico et al., 2019).

Findings also suggested that the alpha estimates of the linguistically adapted versions may be lower than original versions in the case of CES-VAT and EAI global scores, and several EDS-21 subscale scores. These findings suggest the existence of possible weaknesses in the linguistic adaptation processes. However, it should be noted that cross-cultural and cross-linguistic research in this field is scarce (Griffiths et al., 2015). Consequently, further research is needed that examines the extent to which the psychometric properties of the scores of the self-report instruments assessing PE are equivalent across their different linguistic adaptations.

There was no conclusive evidence found linking the proportion of females included in the samples with the alpha estimates of the global scores of the instruments under consideration. This suggests that the reliability of such scores does not greatly differ between males and females. However, this was not the case for some of the subscale scores (i.e., weight control and exercise rigidity subscales of the CET; and tolerance, lack of control, and time subscales of the EDS-21). Indeed, evidence suggested that the higher the number of females in the sample, the higher the reliability alpha estimates of these subscale scores. Therefore, the reliability of these scores may be lower for males than for females. These findings are relevant considering that gender has been proposed as a potential risk factor for several potentially addictive behaviours and, particularly, PE (Bueno-Antequera et al., 2020; Cunningham, Pearman, & Brewerton, 2016). The existence of gender differences in reliability scores may have led to biased estimates in comparisons involving these two population groups.

A last notable group of findings emerging from moderator analyses concerns continuous variables. The fact that no evidence was obtained relating alpha values to mean scores on the scales suggests that the reliability of the scores examined is likely to be similar among individuals with very different levels of self-reported PE. An exception to this general trend was the negative relationship observed between the mean scores and the associated reliability values in the case of the EAI. This is important because it suggests that the reliability of the EAI scores may decrease among individuals scoring high on this instrument. This might be explained by evidence suggesting that individuals with similarly high levels of PE on the EAI may differ markedly on the score for the item reflecting conflict (Chamberlain & Grant, 2020; Sicilia, Alcaraz-Ibáñez, Chiminazzo, & Fernandes, 2020). This may imply a decreased level of inter-correlations among items and, by extension, a decrease in alpha values (Greco et al., 2018).

Finally, it worth noting that the variance of scores under consideration were found to be positively related to alpha estimates in just in three cases (i.e., the avoidance and mood modification subscales of the CET, and the tolerance subscale of the EDS-21). These findings are somewhat unexpected considering that psychometric theory points to score variance as one of the main components of reliability estimation (Nunnally & Bernstein, 1994). From this, it follows that the population characteristics already discussed here may help explain the variability of alpha to a greater extent than the standard deviation of the scores. On balance, findings from the moderator analyses underscore the need to examine reliability in each of the groups involved in cross-groups comparisons on self-reported PE symptoms.

Reliability reporting practices in studies using self-report assessment of problematic exercise

The global induction rate found in the present study (i.e., 47.58%) appears to be slightly higher than the one reported for exercise psychology research more generally (i.e., 41.20%; Wilson, Mack, & Sylvester, 2011). It is worth noting that induction rates above the mean were found for the instruments whose scores showed the lowest values of alpha at the global level (i.e., EAI) and subscale level (i.e., EDQ). This suggests that information concerning reliability in this field may be more likely to be omitted for those scores with lower values of alpha. In the case of the EAI, one explanation for these findings may be that this instrument has been used not only for providing a continuous score representing the construct of interest but also as a screening instrument for the purpose of distinguishing individuals at-risk from those having some or no symptoms of exercise addiction. Therefore, the possibility exists that the focus on classifying individuals on the basis of a fixed cut-off point may have led some authors to overlook the issue of examining the reliability of the instrument’s global score.

A particularly worrying issue in view of the highly prevalent use of alpha is the almost non-existent testing of the assumptions required for its unbiased employment. Researchers in this field may opt instead to use the reliability index that is most appropriate to the data (Cho & Kim, 2015). A misconception that may deter researchers from approaching this task is the alleged difficulty of both testing the assumptions of alpha and using the alternative methods required when its assumptions are violated (Cho, 2016; Hayes & Coutts, 2020). However, it should be noted that convenient practical guidelines for addressing these tasks have been provided, with some involving relatively non-complex tools (e.g., spreadsheet-based solutions; Cho, 2016) or software that is familiar to large numbers of researchers (e.g., SPSS; Hayes & Coutts, 2020).

Limitations

Despite the many strengths of the present review, there are a number of limitations. A first group of limitations concerns the limited data available on the population characteristics being examined as potential moderators. For example, the small number of studies reporting reliability estimates in some populations meant that, in many cases, only a small number of primary estimates were available. This prevented providing a higher level of evidence for some of the moderation analyses conducted or even, in some cases, from carrying them out at all. The latter was the case for the EDQ, for which it was impossible to examine the variables that may contribute to the variability of the alpha estimates of its global and subscale scores. Also related to the limited availability of data were the characteristics of the study participants. For example, there were more studies that omitted information on exercise modalities or minimum exercise levels of the participants than those that provided such information. These omissions are particularly relevant in view of the limited amount of variance (i.e., <50%) explained by some of the regression models aimed at exploring the potential sources of variability in the alpha estimates. This is so because these relatively low levels of explained variance point towards the existence of other important moderator variables beyond those considered in the present study. This scarcity of data is also relevant given the results here pointed to some of the variables for which limited data were available (e.g., region or exercise modality) as potential moderators of the alpha estimates under consideration. In view of these limitations, a two suggestions can be made. Firstly, researchers in this field should pay particular attention to reporting the characteristics of study participants. This means providing sociodemographic information that, in view of the findings here, may be of interest due to its likely influence on the reliability levels of the scores in terms of coefficient alpha. Examples of the latter include the type of survey, volume of exercise, and the main exercise modality practised. Moreover, it would be particularly useful to provide specific information for the subgroups identified on the basis of these or other socio-demographic variables, because this would facilitate further meta-analytical research. Secondly, more research is needed that examines the reliability of the scores of self-report instruments assessing PE among populations for which limited evidence is currently available. Depending on the instrument, this would involve regions or linguistic contexts still under represented, as well as clinical populations in terms of eating disorders.

A second important limitation is that the fact that there were virtually no primary studies reporting test-retest reliability. This prevented the providing of summarized evidence on the consistency of instrument scores over time. Therefore, further primary research is needed examining the reliability of the test scores under consideration in terms of temporal stability. Finally, it worth mentioning the lack of testing of the assumptions required for the unbiased function of alpha. This makes it advisable to treat the results presented with caution, particularly in the case of the global scores of instruments with a non-clearly unidimensional character (i.e., EDQ, CET, and EDS-21).

Conclusions and practical implications

First, the alpha estimates of the global and subscale scores of existing self-report instruments assessing PE vary largely not just from one to the other but also across different applications. Indeed, the 95% CI of the summarized alpha estimates obtained in the present study did not contain (in most cases) the alpha values reported in the studies in which the instruments under consideration were originally proposed. Therefore, the possibility exists that the originally-reported alpha values were not the most adequate ones to be compared with those obtained in primary research, nor to correct for measurement-related artefacts in quantitative meta-analytic research. It is therefore suggested that the values provided in the present study should be used for such purposes.

Second, the reliability of test scores of existing self-report instruments assessing PE appears to be particularly sensitive to the characteristics of the study population. Researchers including the self-report PE instruments in their studies are encouraged to report specific reliability estimates for the different population groups of interest. This would provide insight into the potential for cross-group comparisons to be biased by the presence of differences in inter-group reliability. Future research efforts aimed at refining existing instruments or proposing new ones should be conducted including not just one or two convenience samples but, instead, several groups according to the characteristics that were proved to be related with the variability in alpha estimates (e.g., clinical condition in terms of eating disorders, language, and exercise modality). This would allow for examining the extent to which the instrument’s scores are acceptable in terms of reliability for a minimum number of target groups of interest, which, if this were not the case, would allow the instrument to be refined at an early stage of development.

Third, existing quantitative research using self-report instruments assessing PE suffers from two main deficiencies in terms of reliability reporting: (i) the frequent omission of reliability estimates for the data at hand; and (ii) the (almost exclusive) employment of alpha without proper testing of the assumptions necessary for its unbiased use or even when the nature of the test to be examined would make its use particularly unsuitable. Researchers, journal editors, and reviewers should be aware of the need to report the reliability of scores derived from instruments assessing PE for the data at hand in all primary research. Therefore, the suitability of reliability index to be used should be justified on the basis of the theoretical nature of the constructs under consideration and the characteristics of the data being examined, for example, in terms of test dimensionality and measurement model.

Funding sources

This research is part of the I+D+I project (grant number PID2019-107674RB-I00), funded by Ministerio de Ciencia e Innovación (MCIN), Agencia Estatal de Investigación (AEI/10.13039/501100011033), Spain. AP (FPU18/01055) is funded by MCIN/AEI/10.13039/501100011033 and Fondo Social Europeo (FSE) “El FSE invierte en tu futuro”. MAI (UAL RRA202101) is funded by Ministerio de Universidades (Plan de Recuperación, Transformación y Resiliencia, Next Generation EU).

Authors’ contribution

AP and MAI designed the study, performed the systematic search and data extraction, completed all statistical analyses and initial drafts of the manuscript. AS and MDG contributed to the drafting of the manuscript and revisions. All authors assisted with drafting of the final version of the manuscript, including critical revisions for intellectual content.

Conflicts of interest

The authors declare no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Supplementary material

Supplementary data to this article can be found online at https://doi.org/10.1556/2006.2022.00014.

References

  • Alcaraz-Ibáñez, M. , Aguilar-Parra, J. M. , & Álvarez-Hernández, J. F. (2018). Exercise addiction: Preliminary evidence on the role of psychological inflexibility. International Journal of Mental Health and Addiction, 16(1), 199206. https://doi.org/10.1007/s11469-018-9875-y.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alcaraz-Ibáñez, M. , Paterna, A. , Sicilia, A. , & Griffiths, M. D. (2020). Morbid exercise behaviour and eating disorders: A meta-analysis. Journal of Behavioral Addictions, 9(2), 206224. https://doi.org/10.1556/2006.2020.00027.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alcaraz-Ibáñez, M. , Paterna, A. , Sicilia, A. , & Griffiths, M. D. (2021). A systematic review and meta-analysis on the relationship between body dissatisfaction and morbid exercise behaviour. International Journal of Environmental Research and Public Health, 18, 585. https://doi.org/10.3390/ijerph18020585.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alcaraz-Ibáñez, M. , Sicilia, A. , Dumitru, D. C. , Paterna, A. , & Griffiths, M. D. (2019). Examining the relationship between fitness-related self-conscious emotions, disordered eating symptoms, and morbid exercise behavior: An exploratory study. Journal of Behavioral Addictions, 8(3), 603612. https://doi.org/10.1556/2006.8.2019.43.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alchieri, J. C. , Gouveia, V. V. , de oliveira, I. C. V. , de medeiros, E. D. , Grangeiro, A. S. de M. , & da Silva, C. F. de L. S. (2015). Exercise dependence scale: Adaptação e evidências de validade e precisão. Jornal Brasileiro de Psiquiatria, 64(4), 279287. https://doi.org/10.1590/0047-2085000000090.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders. DSM-IV (4th ed.). American Psychiatric Association.

    • Search Google Scholar
    • Export Citation
  • Bonett, D. G. (2002). Sample size requirements for testing and estimating coefficient alpha. Journal of Educational and Behavioral Statistics, 27(4), 335340. https://doi.org/10.3102/10769986027004335.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bueno-Antequera, J. , Mayolas-Pi, C. , Reverter-Masià, J. , López-Laval, I. , Oviedo-Caro, M. Á. , Munguía-Izquierdo, D. , … Legaz-Arrese, A. (2020). Exercise addiction and its relationship with health outcomes in indoor cycling practitioners in fitness centers. International Journal of Environmental Research and Public Health, 17, 4159. https://doi.org/10.3390/ijerph17114159.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bull, F. C. , Al-Ansari, S. S. , Biddle, S. , Borodulin, K. , Buman, M. P. , Cardon, G. , … Willumsen, J. F. (2020). World Health Organization 2020 guidelines on physical activity and sedentary behaviour. British Journal of Sports Medicine, 54(24), 14511462. https://doi.org/10.1136/bjsports-2020-102955.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chamberlain, S. R. , & Grant, J. E. (2020). Is problematic exercise really problematic? A dimensional approach. CNS Spectrums, 25(1), 6470. https://doi.org/10.1017/S1092852919000762.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cho, E. (2016). Making reliability reliable: A systematic approach to reliability coefficients. Organizational Research Methods, 19(4), 651682. https://doi.org/10.1177/1094428116656239.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cho, E. , & Kim, S. (2015). Cronbach’s coefficient alpha: Well known but poorly understood. Organizational Research Methods, 18(2), 207230. https://doi.org/10.1177/1094428114555994.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessments instruments in psychology. Psychological Assessment, 6, 284290. https://doi.org/10.1037/1040-3590.6.4.28.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cunningham, H. E. , Pearman, S. , & Brewerton, T. D. (2016). Conceptualizing primary and secondary pathological exercise using available measures of excessive exercise. International Journal of Eating Disorders, 49(8), 778792. https://doi.org/10.1002/eat.22551.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Davis, C. , Brewer, H. , & Ratusny, D. (1993). Behavioral frequency and psychological commitment: Necessary concepts in the study of excessive exercising. Journal of Behavioral Medicine, 16(6), 611628. https://doi.org/10.1007/BF00844722.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Di Lodovico, L. , Poulnais, S. , & Gorwood, P. (2019). Which sports are more at risk of physical exercise addiction: A systematic review. Addictive Behaviors, 93, 257262. https://doi.org/10.1016/j.addbeh.2018.12.030.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ding, D. , Lawson, K. D. , Kolbe-Alexander, T. L. , Finkelstein, E. A. , Katzmarzyk, P. T. , van Mechelen, W. , & Pratt, M. (2016). The economic burden of physical inactivity: A global analysis of major non-communicable diseases. The Lancet, 388(10051), 13111324. https://doi.org/10.1016/S0140-6736(16)30383-X.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Downs, D. S. , Hausenblas, H. A. , & Nigg, C. R. (2004). Factorial validity and psychometric examination of the exercise dependence scale-revised. Measurement in Physical Education and Exercise Science, 8(4), 183201. https://doi.org/10.1207/s15327841mpee0804.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egan, S. J. , Bodill, K. , Watson, H. J. , Valentine, E. , Shu, C. , & Hagger, M. S. (2017). Compulsive exercise as a mediator between clinical perfectionism and eating pathology. Eating Behaviors, 24, 1116. https://doi.org/10.1016/j.eatbeh.2016.11.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Formby, P. , Watson, H. J. , Hilyard, A. , Martin, K. , & Egan, S. J. (2014). Psychometric properties of the Compulsive Exercise Test in an adolescent eating disorder population. Eating Behaviors, 15(4), 555557. https://doi.org/10.1016/j.eatbeh.2014.08.013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, R. , Gartlehner, G. , Grant, M. , Shamliyan, T. , Sedrakyan, A. , & Wilt, T. J. , et al. (2011). Conducting quantitative synthesis when comparing medical interventions: AHRQ and the effective health care program. Journal of Clinical Epidemiology, 64, 11871197. https://doi.org/10.1016/j.jclinepi.2010.08.010.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gignac, G. E. (2014). On the inappropriateness of using items to calculate total scale score reliability via coefficient alpha for multidimensional scales. European Journal of Psychological Assessment, 30(2), 130139. https://doi.org/10.1027/1015-5759/a000181.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Graham, J. M. , & Unterschute, M. S. (2015). A reliability generalization meta-analysis of self-report measures of adult attachment. Journal of Personality Assessment, 97(1), 3141. https://doi.org/10.1080/00223891.2014.927768.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Greco, L. M. , O’Boyle, E. H. , Cockburn, B. S. , & Yuan, Z. (2018). Meta-analysis of coefficient alpha: A reliability generalization study. Journal of Management Studies, 55(4), 583618. https://doi.org/10.1111/joms.12328.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Griffiths, M. D. , Szabo, A. , & Terry, A. (2005). The exercise addiction inventory: A quick and easy screening tool for health practitioners. British Journal of Sports Medicine, 39(6), 13. https://doi.org/10.1136/bjsm.2004.017020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Griffiths, M. D. , Urbán, R. , Demetrovics, Z. , Lichtenstein, M. B. , de la Vega, R. , Kun, B. , … Szabo, A. (2015). A cross-cultural re-evaluation of the Exercise Addiction Inventory (EAI) in five countries. Sports Medicine - Open, 1(5), 17. https://doi.org/10.1186/s40798-014-0005-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hausenblas, H. A. , & Downs, D. S. (2002). How much is too much? The development and validation of the exercise dependence scale. Psychology & Health, 17(4), 387404. https://doi.org/10.1080/0887044022000004894.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hayes, A. F. , & Coutts, J. J. (2020). Use omega rather than Cronbach’s alpha for estimating reliability. But…. Communication Methods and Measures, 14(1), 124. https://doi.org/10.1080/19312458.2020.1718629.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Higgins, J. P. T. , Thompson, S. G. , Deeks, J. J. , & Altman, D. G. (2003). Measuring inconsistency in meta-analyses testing for heterogeneity. BMJ, 327, 557560. https://doi.org/10.1136/bmj.327.7414.557.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hoekstra, R. , Vugteveen, J. , Warrens, M. J. , & Kruyen, P. M. (2019). An empirical analysis of alleged misunderstandings of coefficient alpha. International Journal of Social Research Methodology, 22(4), 351364. https://doi.org/10.1080/13645579.2018.1547523.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Juwono, I. D. , & Szabo, A. (2021). 100 cases of exercise addiction: More evidence for a widely researched but rarely identified dysfunction. International Journal of Mental Health and Addiction, 19, 17991811. https://doi.org/10.1007/s11469-020-00264-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kern, L. (2007). Validation de l’adaptation française de l’échelle de dépendance à l’exercice physique: l’EDS-R. Pratiques Psychologiques, 13(4), 425441. https://doi.org/10.1016/j.prps.2007.06.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kern, L. , & Baudin, N. (2011). Validation franaise du questionnaire de dépendance de l’exercice physique (Exercise Dependence Questionnaire). Revue Europeene de Psychologie Appliquee, 61(4), 205211. https://doi.org/10.1016/j.erap.2011.08.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knapp, G. , & Hartung, J. (2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 26932710. https://doi.org/10.1002/sim.1482.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kyriazos, T. A. , & Stalikas, A. (2018). Applied psychometrics: The steps of scale development and standardization process. Psychology, 09(11), 25312560. https://doi.org/10.4236/psych.2018.911145.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lichtenstein, M. B. , & Jensen, T. T. (2016). Exercise addiction in CrossFit: Prevalence and psychometric properties of the exercise addiction inventory. Addictive Behaviors Reports, 3, 3337. https://doi.org/10.1016/j.abrep.2016.02.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, L. (2018). Bias caused by sampling error in meta-analysis with small sample sizes. PloS One, 13(9), e0204056. https://doi.org/10.1371/journal.pone.0204056.

    • Search Google Scholar
    • Export Citation
  • Li, M. , Nie, J. , & Ren, Y. (2016). Verification of Exercise Addiction Inventory for Chinese college students based on SEM model. International Journal of Simulation: Systems, Science and Technology, 17(12), 21.121.6. https://doi.org/10.5013/IJSSST.a.17.12.21.

    • Search Google Scholar
    • Export Citation
  • Lipsey, M. W. , & Wilson, D. (2001). Practical meta analysis. In Applied social research methods series. Sage Publications.

  • Marques, A. , Peralta, M. , Sarmento, H. , Loureiro, V. , Gouveia, É. R. , & Gaspar de Matos, M. (2019). Prevalence of risk for exercise dependence: A systematic review. Sports Medicine, 49(2), 319330. https://doi.org/10.1007/s40279-018-1011-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Meule, A. , Schrambke, D. , Furst Loredo, A. , Schlegl, S. , Naab, S. , & Voderholzer, U. (2021). Inpatient treatment of anorexia nervosa in adolescents: A 1-year follow-up study. European Eating Disorders Review, 29, 165177. https://doi.org/10.1002/erv.2808.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Moher, D. , Liberati, A. , Tetzlaff, J. , & Altman, D. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Plos Medicine, 6(7), e1000097. https://doi.org/10.1371/journal.pmed.1000097.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mokkink, L. B. , Terwee, C. B. , Patrick, D. L. , Alonso, J. , Stratford, P. W. , Knol, D. L. , Bouter, L. M. , & de Vet, H. C. W. (2010). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health related patient-reported outcomes. Journal of Clinical Epidemiology, 63(7), 737745. https://doi.org/10.1016/j.jclinepi.2010.02.006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mónok, K. , Berczik, K. , Urbán, R. , Szabo, A. , Griffiths, M. D. , Farkas, J. , … Demetrovics, Z. (2012). Psychometric properties and concurrent validity of two exercise addiction measures: A population wide study. Psychology of Sport and Exercise, 13(6), 739746. https://doi.org/10.1016/j.psychsport.2012.06.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nunnally, J. C. , & Bernstein, I. (1994). Psychometric theory. McGraw-Hil.

  • Ogden, J. , Veale, D. , & Summers, Z. (1997). The development and validation of the exercise dependence questionnaire. Addiction Research, 5(4), 343356. https://doi.org/10.3109/16066359709004348.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Page, M. J. , Higgins, J. P. T. , & Sterne, J. A. C. (2019). Assessing risk of bias due to missing results in a synthesis. In J. Higgins , J. Thomas , J. Chandler , M. Cumpston , T. Li , M. Page , & V. Welch (Eds.), Cochrane handbook for systematic reviews of interventions. Cochrane.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Panayides, P. (2013). Coefficient alpha: Interpret with caution. Europe’s Journal of Psychology, 9(4), 687696. https://doi.org/10.5964/ejop.v9i4.653.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Parastatidou, I. S. , Doganis, G. , Theodorakis, Y. , & Vlachopoulos, S. P. (2012). Addicted to exercise: Psychometric properties of the exercise dependence scale-revised in a sample of Greek exercise participants. European Journal of Psychological Assessment, 28(1), 310. https://doi.org/10.1027/1015-5759/a000084.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pasman, L. , & Thompson, J. K. (1988). Body image and eating disturbance in obligatory runners, obligatory weightlifters, and sedentary individuals. International Journal of Eating Disorders, 7(6), 759769. https://doi.org/10.1002/1098-108X(198811)7:6<759::AID-EAT2260070605>3.0.CO;2-G.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pigott, T. D. (2012). Advances in meta-analysis. Springer. https://doi.org/10.1007/978-1-4614-2278-5.

  • Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 118(2), 183192. https://doi.org/10.1037/0033-2909.118.2.183.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rubio-Aparicio, M. , Badenes-Ribera, L. , Sánchez-Meca, J. , Fabris, M. A. , & Longobardi, C. (2020). A reliability generalization meta-analysis of self-report measures of muscle dysmorphia. Clinical Psychology: Science and Practice, 27(1), 124. https://doi.org/10.1111/cpsp.12303.

    • Search Google Scholar
    • Export Citation
  • Sánchez-Meca, J. , López-López, J. A. , & López-Pina, J. A. (2013). Some recommended statistical analytic practices when reliability generalization studies are conducted. British Journal of Mathematical and Statistical Psychology, 66(3), 402425. https://doi.org/10.1111/j.2044-8317.2012.02057.x.

    • Search Google Scholar
    • Export Citation
  • Sauchelli, S. , Arcelus, J. , Granero, R. , Jiménez-Murcia, S. , Agüera, Z. , Del Pino-Gutiérrez, A. , & Fernández-Aranda, F. (2016). Dimensions of compulsive exercise across eating disorder diagnostic subtypes and the validation of the Spanish version of the Compulsive Exercise Test. Frontiers in Psychology, 7, 1852. https://doi.org/10.3389/fpsyg.2016.01852.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Scharmer, C. , Gorrell, S. , Schaumberg, K. , & Anderson, D. A. (2020). Compulsive exercise or exercise dependence? Clarifying conceptualizations of exercise in the context of eating disorder pathology. Psychology of Sport and Exercise, 46, 101586. https://doi.org/10.1016/j.psychsport.2019.101586.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shin, K. , & You, S. (2015). Factorial validity of the Korean version of the exercise dependence scale–revised. Perceptual and Motor Skills, 121(3), 889899. https://doi.org/10.2466/03.08.PMS.121c27x8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , Alcaraz-Ibáñez, M. , Chiminazzo, J. G. C. , & Fernandes, P. T. (2020). Latent profile analysis of exercise addiction symptoms in Brazilian adolescents: Association with health-related variables. Journal of Affective Disorders, 273, 223230. https://doi.org/10.1016/j.jad.2020.04.019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , Alcaraz-Ibáñez, M. , Lirola, M. J. , Burgueño, R. , & Maher, A. (2018). Exercise motivational regulations and exercise addiction: The mediating role of passion. Journal of Behavioral Addictions, 7(2), 482492. https://doi.org/10.1556/2006.7.2018.36.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , Alías-García, A. , Ferriz, R. , & Moreno-Murcia, J. A. (2013). Spanish adaptation and validation of the exercise addiction inventory (EAI). Psicothema, 25(3), 377383. https://doi.org/10.7334/psicothema2013.21.

    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , Bracht, V. , Penha, V. , Almeida, U. R. , Ferriz, R. , & Alcaraz-Ibáñez, M. (2017). Propiedades psicométricas del Exercise Addiction Inventory (EAI) en una muestra de estudiantes brasileños universitarios [Psychometric properties of the Exercise Addiction Inventory (EAI) in a sample of Brazilian university students]. Universitas Psychologica, 16(2), 176185. https://doi.org/10.11144/Javeriana.upsy16-2.ppea.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , & González-Cutre, D. (2011). Dependence and physical exercise: Spanish validation of the exercise dependence scale-revised (EDS-R). The Spanish Journal of Psychology, 14(1), 421431. https://doi.org/10.5209/rev_SJOP.2011.v14.n1.38.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sicilia, A. , Paterna, A. , Alcaraz-Ibáñez, M. , & Griffiths, M. D. (2021). Theoretical conceptualisations of problematic exercise in psychometric assement instruments: A systematic review. Journal of Behavloral Addictions, 10(1), 420. https://doi.org/10.1556/2006.2021.00019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Slaney, K. (2017). Validating psychological constructs. Palgrave Macmillan. https://doi.org/10.1057/978-1-137-38523-9.

  • Szabo, A. , Demetrovics, Z. , & Griffiths, M. D. (2018). Morbid exercise behavior: Addiction or psychological escape? In H. Budde & M. Wegner (Eds.), The exercise effect on mental health: Neurobiological mechanisms (pp. 277311). Routledge.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Szabo, A. , Griffiths, M. D. , de La Vega Marcos, R. , Mervó, B. , & Demetrovics, Z. (2015). Methodological and conceptual limitations in exercise addiction research. Yale Journal of Biology and Medicine, 88, 303308.

    • Search Google Scholar
    • Export Citation
  • Szabo, A. , Pinto, A. , Griffiths, M. D. , Kovácsik, R. , & Demetrovics, Z. (2019). The psychometric evaluation of the Revised Exercise Addiction Inventory: Improved psychometric properties by changing item response rating. Journal of Behavioral Addictions, 8(1), 157161. https://doi.org/10.1556/2006.8.2019.06.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Taranis, L. , Touyz, S. , & Meyer, C. (2011). Disordered eating and exercise: Development and preliminary validation of the compulsive exercise test (CET). European Eating Disorders Review, 19(3), 256268. https://doi.org/10.1002/erv.1108.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Terry, A. , Szabo, A. , & Griffiths, M. D. (2004). The exercise addiction inventory: A new brief screening tool. Addiction Research and Theory, 12(5), 489499. https://doi.org/10.1080/16066350310001637363.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trott, M. , Yang, L. , Jackson, S. E. , Firth, J. , Gillvray, C. , Stubbs, B. , & Smith, L. (2020). Prevalence and correlates of exercise addiction in the presence vs. absence of indicated eating disorders. Frontiers in Sports and Active Living, 2(84). https://doi.org/10.3389/fspor.2020.00084.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vacha-Haase, T. , Henson, R. K. , & Caruso, J. C. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62(4), 562569. https://doi.org/10.1177/0013164402062004002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vacha-Haase, T. , Kogan, L. R. , & Thompson, B. (2000). Sample compositions and variabilities in published studies versus those of test manuals: Validity of score reliability inductions. Educational and Psychological Measurement, 60(4), 502522. https://doi.org/10.1177/00131640021970682.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vacha-Haase, T. , & Thompson, B. (2011). Score reliability: A retrospective look back at 12 years of reliability generalization studies. Measurement and Evaluation in Counseling and Development, 44(3), 159168. https://doi.org/10.1177/0748175611409845.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vicent, M. , Rubio-Aparicio, M. , Sánchez-Meca, J. , & Gonzálvez, C. (2019). A reliability generalization meta-analysis of the child and adolescent perfectionism scale. Journal of Affective Disorders, 245, 533544. https://doi.org/10.1016/j.jad.2018.11.049.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilson, P. M. , Mack, D. E. , & Sylvester, B. (2011). When a little myth goes a long way: The use (or misuse) of cut-points, interpretations, and discourse with coefficient-alpha in exercise psychology. In A. M. Columbus (Ed.), Advances in psychology research (pp. 117). Nova Science Publishers.

    • Search Google Scholar
    • Export Citation
  • Zeeck, A. , Schlegel, S. , Giel, K. E. , Junne, F. , Kopp, C. , Joos, A. , … Hartmann, A. (2017). Validation of the German version of the commitment to exercise scale. Psychopathology, 50(2), 146156. https://doi.org/10.1159/000455929.

    • Crossref
    • Search Google Scholar
    • Export Citation

Supplementary Materials

  • Alcaraz-Ibáñez, M. , Aguilar-Parra, J. M. , & Álvarez-Hernández, J. F. (2018). Exercise addiction: Preliminary evidence on the role of psychological inflexibility. International Journal of Mental Health and Addiction, 16(1), 199206. https://doi.org/10.1007/s11469-018-9875-y.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alcaraz-Ibáñez, M. , Paterna, A. , Sicilia, A. , & Griffiths, M. D.