View More View Less
  • 1 Department of Post-doctoral Research, Credit Reference Center, The People's Bank of China, Chengfangjie No. 32, Xichengqu, Beijing 100800, China
  • | 2 Department of Post-doctoral Research, Financial Research Institute, The People's Bank of China, Chengfangjie No. 32, Xichengqu, Beijing 100800, China
  • | 3 Department of MSI, Center for R&D Monitoring (ECOOM), Katholieke Universiteit Leuven, Waaistraat 6, 3000 Leuven, Belgium
  • | 4 Hungarian Academy of Sciences, IRPS, Budapest, Hungry
  • | 5 ESAT-SCD & K.U. Leuven-IBBT Future Health Department, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
Restricted access

Abstract

Previous studies have shown that hybrid clustering methods based on textual and citation information outperforms clustering methods that use only one of these components. However, former methods focus on the vector space model. In this paper we apply a hybrid clustering method which is based on the graph model to map the Web of Science database in the mirror of the journals covered by the database. Compared with former hybrid clustering strategies, our method is very fast and even achieves better clustering accuracy. In addition, it detects the number of clusters automatically and provides a top-down hierarchical analysis, which fits in with the practical application. We quantitatively and qualitatively asses the added value of such an integrated analysis and we investigate whether the clustering outcome provides an appropriate representation of the field structure by comparing with a text-only or citation-only clustering and with another hybrid method based on linear combination of distance matrices. Our dataset consists of about 8,000 journals published in the period 2002–2006. The cognitive analysis, including the ranked journals, term annotation and the visualization of cluster structure demonstrates the efficiency of our strategy.

  • Baeza-Yates, R. A., Ribeiro-Neto, B. 1999 Modern information retrieval Addison-Wesley Longman Publishing Co., Inc. Boston, MA.

  • Blondel, V. D., Guillaume, J. L., Lambiotte, R., Lefebvre, E. 2008 Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008 10 P10008 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Braam, R. R., Moed, H. F. A. F. J. van Raan 1991 Mapping of science by combined co-citation and word analysis, Part I: Structural aspects. Journal of the American Society for Information Science 42 4 233251 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Braam, R. R., Moed, H. F. A. F. J. van Raan 1991 Mapping of science by combined co-citation and word analysis, Part II: Dynamical aspects. Journal of the American Society for Information Science 42 4 252266 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Calado, P., Ribeiro-Neto, B., Ziviani, N., Moura, E., Silva, I. 2003 Local versus global link information in the web. ACM Transactions on Information Systems 21:4263 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Calado, P., Cristo, M., Gonçalves, M. A. E. S. de Moura Ribeiro-Neto, B., Ziviani, N. 2006 Link-based similarity measures for the classification of web documents. Journal of the American Society for Information Science and Technology 57:208221 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Clauset, A., Newman, M. E. J., Moore, C. 2004 Finding community structure in very large networks. Physical Review E 70 6 066111 .

  • Fortunato, S. 2010 Community detection in graphs. Physics Reports 486:75174 .

  • Glenisson, P., Glänzel, W., Janssens, F. B. De Moor 2005 Combining full text and bibliometric information in mapping scientific disciplines. Information Process Management 41:15481572 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hatcher, E., Gospodnetić, O. 2004 Lucene in action Manning Publications Co Greenwich, CT.

  • He, X., Zha, H., Ding, C., Simon, H. 2002 Web document clustering using hyperlink structures. Computational Statistics and Data Analysis 41 1 1945 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hubert, L., Arabie, P. 1985 Comparing partitions. Journal of Classification 2 1 193218 .

  • Jaccard, P. 1901 Étude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37:547579.

    • Search Google Scholar
    • Export Citation
  • Jain, A. K. 2010 Data clustering: 50 Years beyond k-means. Pattern Recognition Letters 31 8 651666 .

  • Jain, A. K., Dubes, R. C. 1988 Algorithms for clustering data Prentice Hall Englewood Cliffs, NJ.

  • Janssens, F., Leta, J., Glänzel, W. B. De Moor 2006 Towards mapping library and information science. Information Processing Management 42:16141642 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janssens, F., Tran Quoc, V., Glänzel, W., & De Moor, B. (2006b). Integration of textual content and link information for accurate clustering of science fields. In Proceedings of the I international conference on multidisciplinary information sciences and technologies, InSciT2006 (pp 615-619).

    • Search Google Scholar
    • Export Citation
  • Janssens, F., Glänzel, W. B. De Moor 2008 A hybrid mapping of information science. Scientometrics 75 3 607631 .

  • Janssens, F., Zhang, L. B. De Moor Glänzel, W. 2009 Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management 45 6 683702 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Joachims, T., Cristianini, N., & Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In Proceedings of the eighteenth international conference on machine learning, ICML’01 (pp 250-257).

    • Search Google Scholar
    • Export Citation
  • Krings, G., Calabrese, F., Ratti, C., Blondel, V. D. 2009 Urban gravity: A model for inter-city telecommunication flows. Journal of Statistical Mechanics: Theory and Experiment 2009:L07003 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lambiotte, R., Panzarasa, P. 2009 Communities, knowledge creation, and information diffusion. Journal of Informetrics 3 3 180190 .

  • Leydesdorff, L., Rafols, I. 2009 A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology 60:348362 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, X., Glänzel, W., & De Moor, B. (2011). A hierarchical and optimal clustering of WoS journal database by hybrid information. In E. Noyons, P. Ngulube, & J. Leta (eds.), Proceedings of ISSI 2011—the 13th international conference on scientometrics and informetrics, Durban, South Africa, pp 485-496.

    • Search Google Scholar
    • Export Citation
  • Luxburg, U. 2007 A tutorial on spectral clustering. Statistics and Computing 17 4 395416 .

  • Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proceedings of the 7th ACM on hypertext and hypermedia (pp 143-152). New York, NY: ACM Press.

    • Search Google Scholar
    • Export Citation
  • Mullins, N., & Snizek, K. W. O. (1988). The structural analysis of a scientific paper. Handbook of quantitative studies of science and technology (pp 81105). New York, NY: Elsevier Science.

    • Search Google Scholar
    • Export Citation
  • Newman, M. E. J. 2004 Analysis of weighted networks. Physical Review E 70 5 056131 .

  • Newman, M. E. J. 2006 Finding community structure in networks using the eigenvectors of matrices. Physical Review E 74 3 036,104 .

  • Newman, M. E. J. 2006 Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the USA 103 23 85778582 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Porter, M. A., Onnela, J. P., & Mucha, P. J. (2009). Communities in networks. Notices of the American Mathematical Society, 56 (9), 10821097, 1164-1166.

    • Search Google Scholar
    • Export Citation
  • Salton, G., McGill, M. J. 1986 Introduction to modern information retrieval McGraw-Hill, Inc. New York, NY.

  • Snizek, K. W. O., Oehler, W., Mullins, N. 1991 Textual and nontextual characteristics of scientific papers: Neglected science indicators. Scientometrics 20 1 2535 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Strehl, A., Ghosh, J. 2002 Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3:583617.

    • Search Google Scholar
    • Export Citation
  • Tang, L., Wang, X., & Liu, H. (2010). Community detection in multi-dimensional networks. Technical Report TR10-006. Tempe, AS: School of Computing, Informatics, and Decision Systems Engineering, Arizona State University.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., & Kitsuregawa, M. (2002). Evaluating contents-link coupled web page clustering for web search results. In Proceedings of the eleventh international conference on Information and knowledge management, CIKM ’02 (pp 499-506).

    • Search Google Scholar
    • Export Citation
  • Zhang, L., Liu, X., Janssens, F., Linag, L., Glänzel, W. 2010 Subject clustering analysis based on ISI category classification. Journal of Informetrics 4 2 185193 .

    • Crossref
    • Search Google Scholar
    • Export Citation