The number of citations received by authors in scientific journals has become a major parameter to assess individual researchers and the journals themselves through the impact factor. A fair assessment therefore requires that the criteria for selecting references in a given manuscript should be unbiased with regard to the authors or journals cited. In this paper, we assess approaches for citations considering two recommendations for authors to follow while preparing a manuscript: (i) consider similarity of contents with the topics investigated, lest related work should be reproduced or ignored; (ii) perform a systematic search over the network of citations including seminal or very related papers. We use formalisms of complex networks for two datasets of papers from the arXiv and the Web of Science repositories to show that neither of these two criteria is fulfilled in practice. By representing the texts as complex networks we estimated a similarity index between pieces of texts and found that the list of references did not contain the most similar papers in the dataset. This was quantified by calculating a consistency index, whose maximum value is one if the references in a given paper are the most similar in the dataset. For the areas of “complex networks” and “graphenes”, the consistency index was only 0.11–0.23 and 0.10–0.25, respectively. To simulate a systematic search in the citation network, we employed a traditional random walk search (i.e. diffusion) and a random walk whose probabilities of transition are proportional to the number of the ingoing edges of the neighbours. The frequency of visits to the nodes (papers) in the network had a very small correlation with either the actual list of references in the papers or with the number of downloads from the arXiv repository. Therefore, apparently the authors and users of the repository did not follow the criterion related to a systematic search over the network of citations. Based on these results, we propose an approach that we believe is fairer for evaluating and complementing citations of a given author, effectively leading to a virtual scientometry.
Aires, R. V. X., Aluísio, S. M., Kuhn, D. C. S., Andreeta, M. L. B., & Oliveira, O. N., Jr. (2000). Combining multiple classifiers to improve part of speech tagging: A case study for Brazilian Portuguese. In Proceedings of the Brazilian AI symposium.
Albert, R., Barabási, A.-L. 2002 Statistical mechanics of complex networks. Reviews of Modern Physics 74:47–97 .
Amancio, D. R., Antiqueira, L., Pardo, T. A. S., Costa, L. F. O. N. Oliveira Jr Nunes, M. G. V. 2008 Complex networks analysis of manual and machine translations. International Journal of Modern Physics C 19 4 583–598 .
Amancio, D. R., Nunes, M. G. V. O. N. Oliveira Jr Pardo, T. A. S., Antiqueira, L., Costa, L. F. 2011 Using metrics from complex networks to evaluate machine translation. Physica A 390:131–142 .
Antiqueira, L., Nunes, M. G. V., Oliveira, O. N, Jr., & Costa, L. F. (2005). Modeling texts as complex networks. In III STIL, Brazilian symposium in information and human language technology, São Leopoldo, RS, Brazil.
Antiqueira, L., Nunes, M. G. V. O. N. Oliveira Jr. Costa, L. F. 2007 Strong correlations between text quality and complex networks features. Physica A 373:811–820 .
Antiqueira, L. O. N. Oliveira Jr. Costa, L. F., Nunes, M. G. V. 2009 A complex network approach to text summarization. Information Sciences 179 5 584–599 .
Barabási, A.-L. 2009 Scale-free networks: A decade and beyond. Science 24 325 412–413 .
Barbara, K. (2004). Procedures for performing systematic reviews. NICTA Technical Report 0400011T.1.
Börner, K., Marus, J. T., Goldstone, R. L. 2004 The simultaneous evolution of author and paper networks. PNAS 101 Suppl. 1 5266–5273 .
Bornmann, L., Daniel, H.-D. 2008 What do citation counts measure? A review of studies on citing behavior. Journal of Documentation 64:45–80 .
Costa, L. F. 2004 What's in a name?. International Journal of Modern Physics C 15:371–379 .
Costa, L. F. (2006). On the dynamics of the h-index in complex networks with coexisting communities. arXiv: physics/0609116.
Cotta, C., & Merelo, J. J. (2005). The complex network of evolutionary computation authors: An initial study. arXiv: physics/0507196v2.
Cronin, B. 1982 Norms and functions in citation—The view of journal editors and referees in psychology. Social Science Information Studies 2:65–78 .
M. De Mey 1982 The cognitive paradigm University of Chicago Press Chicago .
Ferrer, I., Cancho, R., Solé, R. V. 2001 The small world of human language. Proceedings: Biological Sciences/The Royal Society 268 1482 5–2261.
Ferrer, I., Cancho, R., Solé, R. V., Köhler, R. 2004 Patterns in syntactic dependency networks. Physical Review E 69 5 1–8.
Gingras, Y., Larivière, V., & Archambault, É., (2009). Literature citations in the internet era. Science, 323 (5910), 36.
Gross, P. L. K., Gross, E. M. 1927 College libraries and chemical education. Science 66:385–389 .
Hajra, K. B., Sen, P. 2005 Aging in citation networks. Physica A 346:44–48 .
Huang, S., Yu, Y., Xue, G.-R., Zhang, B.-Y., Chen, Z., Ma, W.-Y. 2006 TSSP: Multi-features based reinforcement algorithm to find related papers. Web Intelligence and Agent Systems 4 3 271–287.
King, J. 1987 A review of bibliometric and other science indicator and their role in research evaluation. Journal of Information Science 13:261–276 .
Lancaster, F. W., Lee, S.-Y. K., Diluvio, C. 1990 Does the place of publication influence citation behavior?. Scientometrics 19 3–4 239–244 .
Lawrence, S. (2001). Free online availability substantially increases a paper's impact. Nature 411, 521.
Lilien, G. L. 2008 The ombudsman: Who's at Fawlt at Fawlty Towers? Commentaries on the citation dilemma. Interfaces 38:123–124 .
Liu, Y., Niculescu-Mizil, A., & Gryc, W. (2009). Topic-link LDA: Joint models of topic and author community. In ICML ’09 proceedings of the 26th annual international conference on machine learning.
MacRoberts, M. H., MacRoberts, B. R. 1997 Citation content analysis of a botany journal. Journal of American Society for Information Science 48:5–274.
Martins, W. S., Gonçalves, M. A., Laender, A. H. F., Ziviani, N. 2010 Assessing the quality of scientific conferences based on bibliographic citations. Scientometrics 83 1 133–155 .
May, K. O. 1967 Abuses of citation indexing. Science 19 156 890–892 .
McClellan, J. E. 2003 Specialist control: The publications committee of the Academie Royal des Sciences. Transactions of the American Philosophical Society 93:1700–1793 .
Meyn, S. P., Tweedie, R. L. 2005 Markov chains and stochastic stability Cambridge University Press Cambridge.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear statistical models (4th ed.). McGraw-Hill/Irwin.
Newman, M. E. J. 2003 The structure and function of complex networks. Siam Review 45 2 167–256 .
Nunes, M. G. V., et al. (1996). O Processo de Construção de um Léxico para o Português do Brasil: Lições Aprendidas e Perspectivas. In II Encontro para o Processamento Computacional de Português Escrito e Falado (pp. 61–70).
Patrick, D. 1985 A measure of standing of journals in stratified networks. Journal of the American Society for lnformation Science 8 5–6 341–363.
Peters, H. P. F., Van Raan, A. F. J. 1994 On determinants of citations scores—A case study in chemical engineering. Journal of the American Society for Information Science 27:292–306.
Ratnaparki, A. (1997). A maximum entropy part-of-speech tagger. In Proceedings of the empirical methods in natural language processing conference, University of Pennsylvania.
Redner, S. 1998 How popular is your paper? An empirical study of the citation distribution. The European Physical Journal B 4:131–134 .
Shevchuk, R., & Snarskii, A. (2010). Studying the structure of complex networks by the transition to acyclic networks. arXiv: 1010.1864.
Sigman, M., Cecchi, G. A. 2002 Global organization of the Wordnet lexicon. Proceedings of the National Academy of Sciences of the United States of America 99 3 7–1742 .
Tan, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Boston: Addison-Wesley.
Thomas, J. et al. 2004 Integrating qualitative research with trials in systematic reviews: an example from public health. British Medical Journal 328:1010–1012 .
Van Raan, A. F. J. 2005 For your citations only?. Scientometrics 59:467–472 .
Velho, L. 1986 The meaning of citation in the context of a scientifically peripheral country. Scientometrics 9 1–2 71–89 .
Vinkler, P. 1987 A quasi-quantitative citation model. Scientometrics 12:47–72 .
Wang, M., Yu, G., & Yu, D. (2009). Effect of the age of papers on the preferential attachment in citation networks. Physica A: Statistical Mechanics and Its Applications, 388 (19), 4273–4276.
White, H. D. 2001 Authors as citers over time. Journal of the American Society for Information Science and Technology 52:87–108 .
White, M. D., Wang, P. L. 1997 A qualitative study of citing behavior: Contributions criteria, and metalevel documentation concerns. Library Quarterly 67:122–154 .
Wright, M., Armstrong, J. S. 2008 The ombudsman: Verification of citations: Fawlty towers of knowledge?. Interfaces 38:125–139 .