The development of the Compulsive Sexual Behavior Disorder Scale (CSBD-19): An ICD-11 based screening measure across three languages

Abstract Background Compulsive Sexual Behavior Disorder (CSBD) is included in the eleventh edition of The International Classification of Diseases (ICD-11) as an impulse-control disorder. Aims The aim of the present work was to develop a scale (Compulsive Sexual Behavior Disorder Scale–CSBD-19) that can reliably and validly assess CSBD based on ICD-11 diagnostic guidelines. Method Four independent samples of 9,325 individuals completed self-reported measures from three countries (the United States, Hungary, and Germany). The psychometric properties of the CSBD-19 were examined in terms of factor structure, reliability, measurement invariance, and theoretically relevant correlates. A potential threshold was determined to identify individuals with an elevated risk of CSBD. Results The five-factor model of the CSBD-19 (i.e., control, salience, relapse, dissatisfaction, and negative consequences) had an excellent fit to the data and demonstrated appropriate associations with the correlates. Measurement invariance suggested that the CSBD-19 functions similarly across languages. Men had higher means than women. A score of 50 points was found as an optimal threshold to identify individuals at high-risk of CSBD. Conclusions The CSBD-19 is a short, valid, and reliable measure of potential CSBD based on ICD-11 diagnostic guidelines. Its use in large-scale, cross-cultural studies may promote the identification and understanding of individuals with a high risk of CSBD.


INTRODUCTION
Six years after the exclusion of hypersexual disorder (HD) from the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association, 2013;Kafka, 2010), Compulsive Sexual Behavior Disorder (CSBD) has been included as a new diagnostic entity in the eleventh edition of The International Statistical Classification of Diseases (ICD-11) (World Health Organization, 2019). The inclusion followed extensive theoretical debates about the classification and conceptualization of compulsive sexual behaviors (CSB), and considerable discussion centered on the poor conceptualization of CSB (Fuss et al., 2019). In the ICD-11, CSBD is characterized by a persistent pattern of failure to control intense, repetitive sexual impulses or urges, resulting in repetitive sexual behavior over an extended period (six months or more) that generates marked distress or impairment in personal, family, social, educational, occupational, or other important areas of functioning (Kraus et al., 2018). The diagnostic guidelines of CSBD include several criteria from the previously proposed HD diagnosis, but also important differences, such as a criterion focused on diminished satisfaction and consideration of moral incongruence, reflecting previous criticism regarding the HD diagnosis (Kafka, 2014) (Appendix 1). Although several scales were developed in the past few decades aiming to assess CSB (B} othe, Kov acs, et al., 2019b;Montgomery-Graham, 2017;Stewart & Fedoroff, 2014), no scale exists that assesses CSBD based on ICD-11 guidelines. Thus, the development of a new, valid, and reliable scale (Compulsive Sexual Behavior Disorder Scale-CSBD-19) that assesses the ICD-11 diagnostic guidelines of CSBD (and does not measure prior criteria such as emotion regulation) across different countries (Kir aly et al., 2019) is necessary for both clinical practice and research purposes.
Previous systematic reviews (Karila et al., 2014;Montgomery-Graham, 2017;Womack, Hook, Ramos, Davis, & Penberthy, 2013) reported that more than 30 methods and instruments were used in prior studies to assess CSB with varying reliability and validity. This lack of consistency improved with the publication of the proposed diagnostic criteria for HD (Kafka, 2010), and the assessment of CSB started to converge. As a result, the Hypersexual Behavior Inventory (Reid, Garos, & Carpenter, 2011) was recommended to be used in large-scale survey studies (B} othe, Kov acs et al., 2019b;Karila et al., 2014;Montgomery-Graham, 2017;Womack et al., 2013). HD has been associated with CSB, and many individuals in treatment for HD (>80%) report problems with pornography use (Reid, Carpenter, et al., 2012a). However, with the rejection of HD and the introduction of CSBD in the ICD-11, a valid and reliable measure to assess CSBD is lacking. CSBD may represent a global phenomenon in all genders (Dickenson, Gleason, Coleman, & Miner, 2018) even though cross-cultural (Klein, Jurin, Briken, & Stulhofer, 2015) and gender-based (B} othe, Bart ok et al., 2018a) studies examining CSBD are largely lacking. Although most studies of CSB include predominately male samples and less is known about CSB in women (Klein, Rettenberger, & Briken, 2014), gender-related differences in CSB may be smaller than previously suggested (Dickenson et al., 2018). Thus, it is important to develop a measure to assess CSBD psychometrically equivalently (i.e., demonstrating high levels of measurement invariance) across gender groups and different countries.

Aims of the present study
The primary aim was to develop a new self-report scale ) that can assess CSBD based on ICD-11 diagnostic guidelines/domains (i.e., control, salience, relapse, dissatisfaction, and negative consequences) across cultures and gender groups. We hypothesized that the CSBD-19 would be valid and reliable and demonstrate similar factor structures across three different countries (the United States, Hungary, and Germany) and in both women and men. We further hypothesized that men would show higher scores than women; and across genders, CSBD-19 scores would correlate with measures of hypersexuality and problematic pornography use, and to a lesser extent, with other sexual activities and measures.

METHOD Participants and procedure
Data were collected via online surveys; completion took approximately 30 minutes. Individuals aged 18 years or older could participate. Regarding Sample 1, respondents were invited to participate via an advertisement on a large Hungarian news portal from May to July 2019. Regarding Sample 2, a nationally representative probability sample of Hungarians who use the Internet at least once a week was randomly selected from an internet-based panel by a research market company (Solid Data ISA) in May 2019 (for similar methods see Orosz, Bruneau, et al., 2018a). Regarding Sample 3, to recruit English-speaking participants, we used Amazon's Mechanical Turk (MTurk)-a reliable data collection platform (Buhrmester, Kwang, & Gosling, 2011)-in August 2019. Between August and September 2019, German-speaking participants were recruited through Internet forums of health care sites and social networks (e.g., Facebook) (Sample 4). Based on prior recommendations for studies conducting factor analysis (VanVoorhis & Morgan, 2007), we aimed to recruit at least 300 participants in each sample to ensure that the analyses would not be underpowered. However, we did not set an upper limit for participation.

Measures
Compulsive Sexual Behavior Disorder Scale . The five factors of the CSBD-19 were based on the ICD-11 diagnostic guidelines for CSBD (see Appendix 1): control (i.e., failure to control CSB), salience (i.e., CSB being the central focus of one's life), relapse (i.e., unsuccessful efforts to reduce CSB), dissatisfaction (i.e., experiencing less or no satisfaction from sexual behaviors), and negative consequences (i.e., CSB generating clinically significant distress or impairment). The negative consequences factor included items related to general and domain-specific adverse consequences. Based on pre-established guidelines (B} othe, , the authors created and evaluated six items for the control, salience, relapse, and dissatisfaction factors. Given that the negative consequences factor included several domains of negative consequences, six items covering neglect and adverse consequences in general, and three items per each domain covering specific negative consequences were included in the initial item set. When creating the items, the authors also considered potential items from the most frequently used prior scales assessing CSBD-related symptoms (i.e., Hypersexual Behavior Inventory  and Hypersexual Behavior Consequences Scale (Reid, Garos, & Fong, 2012b)). Before participants indicated their levels of agreement with each item on a four-point scale (1 5 "totally disagree", 4 5 "totally agree"), they were provided with a definition for "sex" as used in the scale (see Appendix 2). Higher scores on the scale indicate higher levels of CSB. The different language versions are available in Appendix 2. 1 Hypersexual Behavior Inventory-Short Version (HBI-8) . The short version of the HBI-8 assesses hypersexuality with eight items. Participants indicated their answers on a five-point scale (1 5 "never"; 5 5 "very often"). The HBI-8 was registered in three samples and demonstrated excellent reliabilities (a Sample 1 5 0.87; a Sample 3 5 0.92; a Sample 4 5 0.86).

Sexuality, Masturbation, and Pornography Use-Related
Questions (B} othe, Bart ok et al., 2018a). Respondents indicated the total number of lifetime sexual partners and casual sexual partners (defined as engaging in sexual activities with someone out of a relationship) on 16-point scales (1 5 "0", 16 5 "more than 50"). Participants reported their past-year sexual frequencies with their established and casual partners (if they had any), their frequency of masturbation, and their frequency of pornography use on 11-point scales (1 5 "never", 11 5 "more than 7 times a week").

Statistical analysis
SPSS 25 and Mplus 7.3 were used to conduct statistical analysis. First, the initial item set of the CSBD-19 was examined to select the best items representing each factor based on the combined guidelines of prior work (Marsh et al., 2005;Orosz et al., 2016;Orosz, T oth-Kir aly et al., 2018b). Confirmatory factor analysis (CFA) was conducted on each sample to cross-validate results. Commonly used goodness-of-fit indices were applied to evaluate models (Hu & Bentler, 1999): Comparative Fit Index (CFI; ≥ .90 acceptable), Tucker-Lewis index (TLI; ≥ .90 acceptable), and Root-Mean-Square Error of Approximation (RMSEA; ≤. 08 acceptable) with its 90% confidence interval. Assumptions of multivariate analyses were examined, and besides normality, all other assumptions were met (see Appendix 3). As compensation for the naturally non-normal distribution of the data, items were treated as categorical indicators, and the mean-and variance-adjusted weighted least-squares estimator (WLSMV) was used (Finney & DiStefano, 2006).
Given that an important point in the assessment of psychological instruments is whether they can be used among individuals from different backgrounds (e.g., different sociodemographic characteristics), it is important to test measurement invariance at high levels (e.g., latent mean invariance) that can ensure the generalizability of the instrument and its constructs (Meredith, 1993;Millsap, 2011;T oth-Kir aly, B} othe, Rig o, & Orosz, 2017;Vandenberg & Lance, 2000). For example, if a scale behaves differently in different populations (i.e., high levels of measurement invariance are not achieved), it may lead to measurement biases and invalid comparisons between examined groups. To test measurement invariance between language-based groups (i.e., Hungarian, English, and German) and gender-based groups (i.e., men and women), we conducted multi-group CFAs using each sample (B} othe, Bart ok et al., 2018a;T oth-Kir aly et al., 2017;Vandenberg & Lance, 2000). Six levels of invariance were tested and compared with increasingly constrained parameters: configural (i.e., factor loadings and threshold were freely estimated), metric (i.e., factor loadings were constrained to be equal), scalar (i.e., factor loadings and threshold were constrained to be equal), residual (i.e., residual variances were constrained to be equal), latent variancecovariance (i.e., factor loadings, thresholds, uniqueness, variances, and covariances were constrained to be equal), and latent mean invariance (i.e., factor loadings, thresholds, uniqueness, variances, covariances, and means were constrained to be equal). Significant decreases in CFI and TLI (ΔCFI ≤ .010; ΔTLI ≤ .010) and significant increases in RMSEA (ΔRMSEA ≤ .015) indicated which level of measurement invariance was achieved (Chen, 2007;Cheung & Rensvold, 2002).
Cronbach's alpha (≥. 70 acceptable) and composite reliability (CR; >.60 acceptable) were calculated to assess the reliability of the CSBD-19. To examine the criterion and convergent validity of the CSBD-19, we assessed associations with theoretically relevant correlates.
To increase the clinical utility of the CSBD-19, we determined a score that could potentially differentiate individuals with and without CSBD. First, we conducted latent profile analysis (LPA) with the robust maximum likelihood estimator on the combined sample to identify a subgroup of individuals who may display symptoms of CSBD (Collins & Lanza, 2010). We used the following indices to determine the number of latent classes based on the factors of CSBD-19: entropy (with higher values indicating higher accuracy), the Akaike Information Criterion (AIC), the bias-corrected Akaike Information Criterion (CAIC), the Bayesian Information Criterion (BIC), and the Sample-Size Adjusted Bayesian Information Criterion (SSABIC) where lower values indicate more parsimonious models. We also used the Lo-Mendell-Rubin Adjusted Likelihood Ratio Test (L-M-R Test) to compare the estimated models. A statistically significant P-value (P < 0.05) suggests that the model with more classes fits the data better. Second, based on membership in the high-risk group in the LPA, we calculated sensitivity (proportion of true positives belonging to the high-risk group), specificity (proportion of the true negatives belonging to the high-risk group), positive predictive value (proportion of the "true positive" cases: individuals with positive test results who were correctly categorized as being high-risk of CSBD), negative predictive value (proportion of "true negative" cases: individuals with negative test results who were correctly diagnosed as not being high-risk of CSBD), and accuracy values for potential scores on the CSBD-19 (Altman & Bland, 1994a, 1994bGlaros & Kline, 1988).

Ethics
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All procedures involving human subjects/patients were approved by the Research Ethics Committee of the E€ otv€ os Lor and University (2016/286-3) and the Institutional review board of the Centre of Psychosocial Medicine/ University Medical Center Hamburg Eppendorf (LPEK-0060). Informed consent was obtained from all participants before enrollment.

Item analysis and item reduction
To have a short scale, first, we evaluated each item based on the following criteria (Marsh et al., 2005;Orosz et al., 2016;Orosz, T oth-Kir aly et al., 2018b): (a) having high corrected item-total correlations, (b) having high standardized factor loadings, (c) having relatively low skewness and kurtoses values, and (d) best covering the breadth of the factor's content (i.e., subjective evaluations from experts in clinical psychology, addiction, sex research, and scale development). Then, we selected those items that represented best the preestablished factors' content and had strong psychometric properties (Appendix 4). As a result, 19 items representing the five pre-established factors of CSBD were retained for further analyses. Three items for the control, salience, relapse, and dissatisfaction factor, and seven items for the negative consequences factor were selected for further analysis.
The dimensionality, structural validity, and reliability of the CSBD-19 Given the theory-based factors of the CSBD-19 (World Health Organization, 2019) (Appendix 1), CFAs were conducted on the selected items in each sample separately to examine the factor structure of the CSBD-19. The inter-factor correlations in each sample are presented in Appendix 5. The five-factor, first-order model had an excellent fit to the data in each language-based sample ( Table 1). The standardized factor loadings and the descriptive statistics of the scale are also presented in Table 2. The CSBD-19 and its factors demonstrated adequate reliability in each sample (Table 2). Note. WLSMV 5 weighted least squares mean-and variance-adjusted estimator; c 2 5 Chi-square; df 5 degrees of freedom; CFI 5 comparative fit index; TLI 5 Tucker-Lewis Index; RMSEA 5 root-mean-square error of approximation; 90% CI 5 90% confidence interval of the RMSEA; ΔCFI 5 change in CFI value compared to the preceding model; ΔTLI 5 change in the TLI value compared to the preceding model; ΔRMSEA 5 change in the RMSEA value compared to the preceding model. Bold letters indicate the final levels of invariance that were achieved. In the language-based comparison, the highest level of measurement invariance (i.e., latent mean invariance) was achieved, indicating that the CSBD-19 functions the same way in each examined language version. In the gender-based comparison, latent variance-covariance was achieved, but latent means invariance was not, indicating important latent mean differences between men and women.*P < 0.001 To lend further support for the validity of the CSBD-19 and to ensure that language-based comparisons are meaningful, we examined the invariance of the factor structure of the CSBD-19 across the four samples. Baseline models were estimated for each group and, then, parameters were gradually constrained. The fit indices suggested that the highest level of invariance (latent mean invariance) was achieved, indicating that the CSBD-19 appears to function the same way in each language version (Table 1). Next, we conducted measurement invariance testing to examine the factor structure of the CSBD-19 across genders (men vs. women) on a combined sample, including samples 1-4. Fit indices suggested that latent variance-covariance invariance was achieved, but latent mean invariance was not, suggesting the presence of latent mean differences between men and women (Table 1). Using the variance-covariance model, latent mean differences between men and women are expressed in SD units and are accompanied by tests of statistical significance. When men's latent means were constrained to zero for the purpose of model identification, women's latent means proved to be substantially lower on all factors (Control: À0.47 SD, P < 0.001; Salience: À0.59 SD, P < 0.001; Relapse: À0.65 SD, P < 0.001; Negative Consequences: À0.31 SD, P < 0.001) except for the Dissatisfaction factor (0.01 SD, P 5 0.612).
PPCS-6 scores and weak-to-moderate, positive associations with frequencies of pornography use, masturbation, and having sex with casual partners in each sample. CSBD-19 scores had weak, positive associations with the numbers of sexual partners and casual sexual partners in one's lifetime in each sample. However, CSBD-19 scores were unrelated or weakly and negatively related to the frequency of past-year sexual activity with one's partner (Table 3).
Determination of a potential threshold score for the CSBD-19 First, latent profile analysis was conducted on the five factors of the CSBD-19 in the combined sample. The AIC, BIC, and SSABIC values continuously decreased as more latent classes were added, and all solutions had high levels of accuracy (based on entropy). The L-M-R Test suggested that the sixclass solution should be favored in contrast to the sevenclass solution; thus, we used these six classes in further analysis (see Appendix 6). The fourth class (high-risk class; 260 participants, 2.8%) represented individuals with being at high-risk of CSBD (Appendix 7). The characteristics of the identified classes are presented in Table 4. The high-risk class demonstrated significantly higher scores on the CSBD-19 (with having the highest score differences on the negative consequences factor), HBI-8, and PPCS-6 than the other classes. The high-risk class had the highest number of lifetime sexual partners and casual sexual partners and the highest frequency of past-year masturbation and pornography use. Based on membership in the high-risk class as a "gold standard", the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of potential threshold scores were calculated for the CSBD-19 (Appendix 8). A score of 50 points was suggested as an optimal cut-off to be classified as being at high-risk of CSBD. For this threshold, the sensitivity was 98.5%, the specificity was 99.1%, the PPV was 76.4%, the NPV was 100%, and the accuracy was 99.1%. These results practically mean that 0.9% of low-risk individuals were misidentified as high-risk individuals, while 1.5% of the "true" high-risk individuals were not recognized by the CSBD-19. Approximately onequarter of the individuals with a positive test result (having ≥ 50 scores on the CSBD-19) was mistakenly identified as high-risk individuals; however, almost everyone with a negative result (having scores <50) were identified correctly as low-risk individuals. Using the established threshold, 4.2% of men and 2.0% of women in Sample 1; 5.2% of men and 3.3% of women in Sample 2; 7.0% of men and 5.5% of women in Sample 3; and 5.6% of men and 0% of women were classified as having high-risk for CSBD.

DISCUSSION
A measure for assessing ICD-11-defined CSBD (World Health Organization, 2019) is necessary to address current gaps in the treatment and research. We developed the CSBD-19 and tested its psychometric properties across three languages in four samples, demonstrating robust psychometric properties in terms of factor structure, reliability, measurement invariance, and associations with theoretically relevant constructs. A threshold was determined that can identify individuals at high-risk of CSBD. Initial findings suggest the CSBD-19 may have clinical utility, although further research is needed to test and refine the CSBD-19 with clinical and nonclinical samples.
The construct validity and reliability of CSBD-19 were cross-validated in three languages in four independent samples from the United States, Hungary, and Germany. Not only was the construct validity of CSBD-19 supported, but also its convergent validity was also established by its positive, strong association with the HBI-8 . In line with previous findings, CSBD-19 scores demonstrated positive, strong associations with measures of problematic pornography use (B} othe, Ko os, T oth-Kir aly, Orosz, & Demetrovics, 2019a;B} othe, T oth-Kir aly et al., 2019c), and positive, weak-to-moderate associations with the past-year frequency of pornography use, past-year frequency of masturbation, and the number of lifetime sexual and casual sexual partners (B} othe, Kov acs et al., 2019b). The frequency of past-year sexual activities with one's partner was unrelated to the CSBD-19 scores, in line with prior findings from large-scale studies ( Stulhofer, Jurin, & Briken, 2016).
High levels of measurement invariance were demonstrated across language-based and gender-based groups. In the case of language-based groups, the highest level of invariance was achieved, suggesting that the CSBD-19 may be used reliably in future cross-cultural studies assessing CSBD and the differences in CSBD scores may be attributed to actual differences between the language-based samples, and not to methodological shortcomings (Kir aly et al., 2019).
Prevalence estimates for being at high-risk for CSBD varied between 0-5.5% for women and 4.2-7% for men in the present study. The observed variation in prevalence rates across countries may be in part explained by the different recruitment methods used (i.e., news portal, research panel, and social media). However, the results support the notion that gender-related differences in CSBD may be smaller than existing data may suggest (Dickenson et al., 2018;Erez, Pilver, & Potenza, 2014). Previous research shows that gender norms may influence sexual desire in women (Rubin et al., 2019), and suggests a possible role for moral incongruence in self-reported problems with CSB (Grubbs, Perry, Wilt, & Reid, 2019). Different prevalence estimates, especially among women, may be related to differences in gender and sexual norms, moral values, and religiosity among the three countries. Although this explanation is rather speculative, future research should examine this possibility. The scale, nevertheless, demonstrated high levels of reliability and validity among both men and women and may be used in men and women, although further testing with women is recommended.
Based on the results of the LPA, six groups were identified and could be reliably distinguished based on their CSBD characteristics. Approximately 85% of the participants belonged to the low-and average-risk classes. Individuals in these classes also reported lower levels of lifetime and past-year sexual activities (e.g., number of lifetime sexual partners or past-year pornography use frequency) than participants in the at-risk and high-risk classes. A minority (7.8%) of participants was included in the at-risk class; these participants demonstrated slightly elevated levels of CSBD compared to the average-risk class. Two higher-risk groups were identified. The first (i.e., satisfied at-risk class) included 4.5% of participants, and they reported elevated levels on all factors of the CSBD-19 except for the dissatisfaction criterion. The findings suggest that these individuals may experience uncontrollable sexual activities, but they are not dissatisfied with their sexual activities, and they do not experience as many negative consequences as people in the high-risk class. These individuals may have higher levels of sexual desire that may result in some similar characteristics as CSBD, but without some important indicators of CSBD ( Stulhofer et al., 2016). Lastly, a high-risk group of CSBD (2.8%) was identified who also demonstrated the highest levels of problematic pornography use and other sexual activities. The percentage of high-risk individuals is in line with prior estimates that suggest that CSBD could be experienced by 1-10% of the general adult population (Montgomery-Graham, 2017).
Finally, the sensitivity and specificity analyses and the positive and negative predictive values suggest an optimal threshold score of 50 points (out of 76 points) that may identify individuals at high-risk of CSBD. Despite the high accuracy of the recommended cut-off score, it should be noted that only community samples (i.e., not clinical samples) were examined in the present study. Moreover, selfreport scales (such as the CSBD-19) should only be used as a first step (screening) of the diagnostic process followed by clinical interviews (B} othe, Kov acs et al., 2019b). Future studies should further validate this threshold in treatmentseeking clinical samples to extend the present findings and provide evidence for the clinical validity and utility of the CSBD-19.
To summarize, the CSBD-19 was developed by following rigorous guidelines, yielded strong psychometric properties in three languages in four large samples, and showed differentiated results in the case of individuals with and without high-risk of CSBD. Despite its strengths, the present study had some limitations that should be noted. The study used cross-sectional, self-reported data; thus, the results may be prone to biases (e.g., social desirability). Also, the study was conducted using only community samples; therefore, the clinical validity and utility of the CSBD-19 require further investigation. Future studies are needed to examine the construct validity of CSBD-19 conducting withinnetwork and between-network studies on different populations, such as in clinical settings, or in different cultures, considering the potential role of moral incongruence in perceived CSBD (Grubbs et al., 2019). Although the CSBD-19 was developed in an international setting and its psychometric properties were tested in Europe and the US as well, the present study is only the first step in a thorough examination of the CSBD-19. Future studies are needed to examine the reliability and the validity of the CSBD-19 in other countries and cultures (e.g., Eastern cultures) (Chen & Jiang, 2020).

Conclusions and implications
The CSBD-19 is a short, valid, and reliable measure of CSBD based on ICD-11 diagnostic guidelines (World Health Organization, 2019). It can be included in large-scale, crosscultural, multi-language studies, and can reliably distinguish between individuals at elevated and lower risk of CSBD. The use of the CSBD-19 should help to identify and study individuals with CSBD. Thus, the incomparability of findings (Karila et al., 2014;Montgomery-Graham, 2017;Womack et al., 2013)-a major problem in research addressing compulsive, impulsive, and addictive sexual behaviors-may be eliminated, and cross-cultural research on CSBD may be facilitated.