Authors:R. Rice, Christine Borgman, Diane Bednarski, and P. Hart
Citation analysis is a useful method for studying a wide range of topics in bibliometrics and the sociology of science. However, many challenges have been made to the validity and reliability of the underlying assumptions, the data, and the methods used in citation studies. This article addresses these issues in three parts. First is a brief review of validity and reliability issues in citation research. Next we explore measurement error in a principal source of journal-to-journal citation data, the Institute for Scientific Information'sJournal Citation Reports. Possible sources of measurement error include discrepancies between citing and cited data, changed or deleted journal titles, aberrant abbreviations, and listing algorithms. The last section is a detailed description of ways to overcome some of the measurement errors. The data and examples are drawn from a journal-to-journal citation study in the fields of Communication, Information Science, and Library Science.
Authors:A. Rivas, J. Deshler, F. Quimby, H. Mohammed, D. Wilson, R. Gonzalez, D. Lein, and P. Bruso
Interdisciplinary synthesis and validity analysis (ISVA), a structured learning approach which integrates learning and communication theories, meta-analytic evaluation methods,
and literature management-related technologies was applied in the context of the 1993–1997 bovine mastitis research literature.
This study investigated whether ISVA could: 1) facilitate the analysis and synthesis of interdisciplinary knowledge claims,
and 2) generate projects or research questions. The bovine mastitis-related literature was conceptualized as composed of microbiological,
immunological, and epidemiological dimensions. Keywords involving these dimensions were searched in theMedline andAgricola databases. A final list of 148 articles were retrieved, analyzed, synthesized into fifteen information sub-sets, and evaluated
for construct, internal, external and statistical validity through an interdisciplinary iterative dialogical process. Validity
threats were re-phrased as new research or educational projects.
Using a random sample of 79 theorists selected from among six of Mullins' theory groups, this study attempts to empirically assess the validity of Mullins' theory group classifications. The procedure involved utilizes multiple discriminant analysis based on four demographic-academic variables standardized relative to the publication date of the first major work written by each theorist. Results of the discriminant analysis indicate 70 percent of 40 cases, for whom complete data were available, are correctly classified, based on Mullins' initial categorizations. These results show Mullins' classification schema as having considerable construct validity, as well as demonstrating the utility of using multiple discriminant analysis as a technique for assessing other classificatory systems.
This paper argues that research performance is essentially a multidimensional concept which cannot be encapsulated into a single universal criterion. Various indicators used in quantitative studies on research performance at micro or meso-levels can be classified into two broad categories: (i) objective or quantitative indicators (e.g. counts of publications, patents, algorithms or other artifacts of research output) and (ii) subjective or qualitative indicators which represent evaluative judgement of peers, usually measured on Likert or semantic differential scales. Because of their weak measurement properties, subjective indicators can also be designated as quasi-quantitative measures. This paper is concerned with the factorial structure and construct validity of quasi-quantitative measures of research performance used in a large-scale empirical study carried out in India. In this study, a reflective measurement model incorporating four latent variables (R & D effectiveness, Recognition, User-oriented effectiveness and Administrative effectiveness) is assumed. The latent variables are operationalized through thirteen indicators measured on 5-point semantic differential scales. Convergent validity, discriminant validity and reliability of the measurement model are tested through LISREL procedure.
This paper examines the peer review procedure of a national science funding organization (Swiss National Science Foundation)
by means of the three most frequently studied criteria reliability, fairness, and validity. The analyzed data consists of
496 applications for project-based funding from biology and medicine from the year 1998. Overall reliability is found to be
fair with an intraclass correlation coefficient of 0.41 with sizeable differences between biology (0.45) and medicine (0.20).
Multiple logistic regression models reveal only scientific performance indicators as significant predictors of the funding
decision while all potential sources of bias (gender, age, nationality, and academic status of the applicant, requested amount
of funding, and institutional surrounding) are non-significant predictors. Bibliometric analysis provides evidence that the
decisions of a public funding organization for basic project-based research are in line with the future publication success
of applicants. The paper also argues for an expansion of approaches and methodologies in peer review research by increasingly
focusing on process rather than outcome and by including a more diverse set of methods e.g. content analysis. Such an expansion
will be necessary to advance peer review research beyond the abundantly treated questions of reliability, fairness, and validity.
This paper investigates the extent to which staff editors’ evaluations of submitted manuscripts—that is, internal evaluations
carried out before external peer reviewing—are valid. To answer this question we utilized data on the manuscript reviewing
process at the journal Angewandte Chemie International Edition. The results of this study indicate that the initial internal evaluations are valid. Further, it appears that external review
is indispensable for the decision on the publication worthiness of manuscripts: (1) For the majority of submitted manuscripts,
staff editors are uncertain about publication worthiness; (2) there is a statistically significant proportional difference
in “Rejection” between the editors' initial evaluation and the final editorial decision (after peer review); (3) three-quarters
of the manuscripts that were rated negatively at the initial internal evaluation but accepted for publication after the peer
review had far above-average citation counts.
Authors:Günter Krampen, Ralf Becker, Ute Wahner, and Leo Montada
In reference to the increasing significance of citation counting in evaluations of scientists and science institutes as well
as in science historiography, it is analyzed empirically what is cited in which frequency and what types of citations in scientific
texts are used. Content analyses refer to numbers of references, self-references, publication language of references cited,
publication types of references cited, and type of citation within the texts. Validity of citation counting is empirically
analyzed with reference to random samples of English and German journal articles as well as German textbooks, encyclopedias,
and test-manuals from psychology. Results show that 25% of all citations are perfunctory, more than 50% of references are
journal articles and up to 40% are books and book-chapters, 10% are self-references. Differences between publications from
various psychological sub-disciplines, publication languages, and types of publication are weak. Thus, validity of evaluative
citation counting is limited because at least one quarter refers to perfunctory citations exhibiting a very low information
utility level and by the fact that existing citation-databases refer to journal articles only.
Summary In science, peer review is the best-established method of assessing manuscripts for publication and applications for research fellowships and grants. However, the fairness of peer review, its reliability and whether it achieves its aim to select the best science and scientists has often been questioned. The paper presents the first comprehensive study on committee peer review for the selection of doctoral (Ph.D.) and post-doctoral research fellowship recipients. We analysed the selection procedure followed by the Boehringer Ingelheim Fonds (B.I.F.), a foundation for the promotion of basic research in biomedicine, with regard to the reliability, fairness and predictive validity of the procedure - the three quality criteria for professional evaluations. We analysed a total of 2,697 applications, 1,954 for doctoral and 743 for post-doctoral fellowships. In 76& of the cases, the fellowship award decision was characterized by agreement between reviewers. Similar figures for reliability have been reported for the grant selection procedures of other major funding agencies. With regard to fairness, we analysed whether potential sources of bias, i.e., gender, nationality, major field of study and institutional affiliation, could have influenced decisions made by the B.I.F. Board of Trustees. For post-doctoral fellowship applications, no statistically significant influence of any of these variables could be observed. For doctoral fellowship applications, we found evidence of an institutional, major field of study and gender bias, but not of a nationality bias. The most important aspect of our study was to investigate the predictive validity of the procedure, i.e., whether the foundation achieves its aim to select as fellowship recipients the best junior scientists. Our bibliometric analysis showed that this is indeed the case and that the selection procedure is thus highly valid: research articles by B.I.F. fellows are cited considerably more often than the “average' paper (average citation rate) published in the journal sets corresponding to the fields “Multidisciplinary', “Molecular Biology & Genetics', and “Biology & Biochemistry' in Essential Science Indicators (ESI) from the Institute for Scientific Information (ISI, Philadelphia, Pennsylvania, USA). Most of the fellows publish within these fields.
Empirical work in the social studies of science has progressed rapidly with the availability and development of the citation indexes. Citation counts have become a widely accepted measure of the quality of a scientific contribution. However, there are several problems involved in the use of citation counts as a measure of quality in science. First, citation counts are sensitive to popular trends in science. In this sense, they approximate a Nielsen rating for science. Second, the distribution of citations restricts their utility to separating the extremes. Third, citation counts are not sensitive to the ethical and moral dimensions of the quality of a scientific contribution. Fourth, citation counts underestimate the contribution of applied scientists. This paper examines these limitations.