View More View Less
  • 1 Department of Sociology, Inforsk, Umeå University, 901 87 Umeå, Sweden
  • | 2 Department of e-Resources, University Library, Stockholm University, 106 91 Stockholm, Sweden
Restricted access

Abstract

The measurement of similarity between objects plays a role in several scientific areas. In this article, we deal with document–document similarity in a scientometric context. We compare experimentally, using a large dataset, first-order with second-order similarities with respect to the overall quality of partitions of the dataset, where the partitions are obtained on the basis of optimizing weighted modularity. The quality of a partition is defined in terms of textual coherence. The results show that the second-order approach consistently outperforms the first-order approach. Each difference between the two approaches in overall partition quality values is significant at the 0.01 level.

  • Ahlgren, P, Colliander, C 2009 Document–document similarity approaches and science mapping: experimental comparison of five approaches. Journal of Informetrics 3 1 4963 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ahlgren, P., & Colliander, C. (2009b). Textual content, cited references, similarity order, and clustering: an experimental study in the context of science mapping. In Proceedings of the 12th International Conference on Scientometrics and Informetrics (Vol. 2, pp 862-873), Rio de Janeiro.

    • Search Google Scholar
    • Export Citation
  • Ahlgren, P, Jarneving, B 2008 Bibliographic coupling, common abstract stems and clustering: A comparison of two document–document similarity approaches in the context of science mapping. Scientometrics 76 2 273290 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ahlgren, P, Jarneving, B, Rousseau, R 2003 Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient. Journal of the American Society for Information Science and Technology 54 6 550560 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Arenas, A., Fernandez, A., & Gomez, S. (2008). Analysis of the structure of complex networks at different resolution levels. New Journal of Physics, 10, Article Number: 053039.

    • Search Google Scholar
    • Export Citation
  • Baeza-Yates, R, Ribeiro-Neto, B 1999 Modern information retrieval Addison-Wesley Harlow, UK.

  • Bland, JM, Kerry, SM 1998 Statistics notes—Weighted comparison of means. British Medical Journal 316 7125 129 .

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics—Theory and Experiment, Article Number: P10008.

    • Search Google Scholar
    • Export Citation
  • Boyack, KW, Klavans, R 2010 Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?. Journal of the American Society for Information Science and Technology 61 12 23892404 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Boyack, KW, Klavans, R, Börner, K 2005 Mapping the backbone of science. Scientometrics 64 3 351374 .

  • Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., et al. (2011). Clustering more than two million biomedical publications: comparing the accuracies of nine text-based similarity approaches. PLoS One, 6 (3), Article Number: e18029.

    • Search Google Scholar
    • Export Citation
  • Cao, M, Gao, X 2005 Combining contents and citations for scientific document classification. AI 2005: Advances in artificial intelligence Springer Berlin 143152.

    • Search Google Scholar
    • Export Citation
  • Cribbin, T 2011 Discovering latent topical structure by second-order similarity analysis. Journal of the American Society for Information Science and Technology 62 6 11881207 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egghe, L 2009 New relations between similarity measures for vectors based on vector norms. Journal of the American Society for Information Science and Technology 60 2 232239 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egghe, L 2010 Good properties of similarity measures and their complementarity. Journal of the American Society for Information Science and Technology 61 10 21512160 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egghe, L 2010 On the relation between the association strength and other similarity measures. Journal of the American Society for Information Science and Technology 61 7 15021504 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egghe, L, Leydesdorff, L 2009 The relation between Pearson's correlation coefficient r and Salton's cosine measure. Journal of the American Society for Information Science and Technology 60 5 10271036 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Egghe, L, Rousseau, R 2006 Classical retrieval and overlap measures satisfy the requirements for rankings based on a Lorenz curve. Information Processing & Management 42 1 106120 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fortunato, S, Barthelemy, M 2007 Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America 104 1 3641 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glenisson, P, Glänzel, W, Persson, O 2005 Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics 63 1 163180 .

  • Gmür, M 2003 Co-citation analysis and the search for invisible colleges: A methodological evaluation. Scientometrics 57 1 2757 .

  • Hamers, L, Hemeryck, Y, Herweyers, G, Janssen, M, Keters, H, Rousseau, R et al. 1989 Similarity measures in scientometric research— The Jaccard index versus Salton cosine formula. Information Processing & Management 25 3 315318 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janssens, F., Quoc, V. T., Glänzel, W., & Moor, B. D. (2006). Integration of textual content and link information for accurate clustering of science fields. In InSCit2006, Current Research in Information Sciences and Technologies: Multidisciplinary Approaches to Global Information Systems (Vol. I, pp. 615619), Merida, Spain.

    • Search Google Scholar
    • Export Citation
  • Klavans, R, Boyack, KW 2006 Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology 57 2 251263 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Levenshtein, V 1966 Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10 8 845848.

  • Leydesdorff, L 2008 On the normalization and visualization of author co-citation data: Salton's cosine versus the Jaccard index. Journal of the American Society for Information Science and Technology 59 1 7785 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, JH 1991 Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory 37 1 145151 .

  • Luukkonen, T, Tijssen, RJW, Persson, O, Sivertsen, G 1993 The measurement of international scientific collaboration. Scientometrics 28 1 1536 .

  • Newman, M. E. J. (2004). Analysis of weighted networks. Physical Review E, 70 (5), Article Number: 056131.

  • Peters, HPF, Van Raan, AFJ 1993 Co-word-based science maps of chemical-engineering. Part 1: Representations by direct multidimensional-scaling. Research Policy 22 1 2345 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Salton, G, Buckley, C 1988 Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 5 513523 .

  • Salton, G, McGill, MJ 1983 Introduction to modern information retrieval McGraw-Hill New York.

  • Schneider, JW, Borlund, P 2007 Matrix comparison, part 1: Motivation and important issues for measuring the resemblance between proximity measures or ordination results. Journal of the American Society for Information Science and Technology 58 11 15861595 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schneider, JW, Borlund, P 2007 Matrix comparison, part 2: Measuring the resemblance between proximity measures or ordination results by use of the mantel and procrustes statistics. Journal of the American Society for Information Science and Technology 58 11 15961609 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tan, P-N, Steinbach, M, Kumar, V 2006 Introduction to data mining Pearson Addison Wesley Boston.

  • NJ van Eck Waltman, L 2009 How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of the American Society for Information Science and Technology 60 8 16351651 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wasserman, S, Faust, K 1994 Social network analysis: Methods and applications Cambridge University Press Cambridge.

  • Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. (1999). KEA: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on Digital libraries, Berkeley, CA.

    • Search Google Scholar
    • Export Citation

Manuscript submission: http://www.editorialmanager.com/scim/

  • Impact Factor (2019): 2.867
  • Scimago Journal Rank (2019): 1.210
  • SJR Hirsch-Index (2019): 106
  • SJR Quartile Score (2019): Q1 Computer Science Apllications
  • SJR Quartile Score (2019): Q1 Library and Information Sciences
  • SJR Quartile Score (2019): Q1 Social Sciences (miscellaneous)
  • Impact Factor (2018): 2.770
  • Scimago Journal Rank (2018): 1.113
  • SJR Hirsch-Index (2018): 95
  • SJR Quartile Score (2018): Q1 Library and Information Sciences
  • SJR Quartile Score (2018): Q1 Social Sciences (miscellaneous)

For subscription options, please visit the website of Springer

Scientometrics
Language English
Size B5
Year of
Foundation
1978
Volumes
per Year
4
Issues
per Year
12
Founder Akadémiai Kiadó
Founder's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
Publisher Akadémiai Kiadó
Springer Nature Switzerland AG
Publisher's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
CH-6330 Cham, Switzerland Gewerbestrasse 11.
Responsible
Publisher
Chief Executive Officer, Akadémiai Kiadó
ISSN 0138-9130 (Print)
ISSN 1588-2861 (Online)