View More View Less
  • 1 College of Information Science and Engineering & ERCMAMT, Wuhan University of Science and Technology, Heping Road No. 947, Wuhan 30081, Hubei, China
  • 2 Department of Electronic Engineering ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
  • 3 Center for R&D Monitoring (ECOOM), Department of MSI, Katholieke Universiteit Leuven, Waaistraat 6, 3000 Leuven, Belgium
  • 4 IRPS, Hungarian Academy of Sciences, Budapest, Hungry
Restricted access

Abstract

With the modern technology fast developing, most of entities can be observed by different perspectives. These multiple view information allows us to find a better pattern as long as we integrate them in an appropriate way. So clustering by integrating multi-view representations that describe the same class of entities has become a crucial issue for knowledge discovering. We integrate multi-view data by a tensor model and present a hybrid clustering method based on Tucker-2 model, which can be regarded as an extension of spectral clustering. We apply our hybrid clustering method to scientific publication analysis by integrating citation-link and lexical content. Clustering experiments are conducted on a large-scale journal set retrieved from the Web of Science (WoS) database. Several relevant hybrid clustering methods are cross compared with our method. The analysis of clustering results demonstrate the effectiveness of the proposed algorithm. Furthermore, we provide a cognitive analysis of the clustering results as well as the visualization as a mapping of the journal set.

  • Arthur, D., & Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Technical Report 2006-13, Stanford InfoLab.

  • Batagelj, V, Mrvar, A 2003 Pajek—analysis and visualization of large networks. Graph Drawing Software 2265:77103.

  • Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining (pp. 1926). IEEE Computer Society, Washington, DC, USA.

    • Search Google Scholar
    • Export Citation
  • Boyack, KW, Klavans, R 2010 Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?. Journal of the American Society for Information Science and Technology 61 12 23892404 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Braam, RR, Moed, HF, van Raan, AFJ 1991 Mapping of science by combined co-citation and word analysis, part i: Structural aspects. Journal of the American Society for Information Science 42 4 233251 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Braam, RR, Moed, HF, van Raan, AFJ 1991 Mapping of science by combined co-citation and word analysis, part ii: Dynamical aspects. Journal of the American Society for Information Science 42 4 252266 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Callon, M, Courtial, JP, Turner, WA, Bauin, S 1983 From translations to problematic networks: An introduction to co-word analysis. Social Science Information 22 2 191235 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Comon, P 1994 Independent component analysis, a new concept?. Signal Processing 36 3 287314 .

  • De Lathauwer, L, De Moor, B, Vandewalle, J 2000 A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 21 4 12531278 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • De Lathauwer, L, De Moor, B, Vandewalle, J 2000 On the best rank-1 and rank approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications 21 4 13241342 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • De Lathauwer, L, Vandewalle, J 2004 Dimensionality reduction in higher-order signal processing and rank- reduction in multilinear algebra. Linear Algebra and its Applications 391:3155 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ding, C., Huang, H., & Luo, D. (2008). Tensor reduction error analysis applications to video compression and classification. In Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (pp. 18). Washington, DC: IEEE Computer Society.

    • Search Google Scholar
    • Export Citation
  • Dunlavy, D. M., Kolda, T. G., & Kegelmeyer, W. P. (2006). Multilinear algebra for analyzing data with multiple linkages. Tech. Rep. SAND2006-2079, Sandia National Laboratories.

    • Search Google Scholar
    • Export Citation
  • Glenisson, P, Glänzel, W, Janssens, F, De Moor, B 2005 Combining full text and bibliometric information in mapping scientific disciplines. Information Processing Management 41 6 15481572 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, X, Zha, H, Ding, C, Simon, H 2002 Web document clustering using hyperlink structures. Computational Statistics and Data Analysis 41 1 1945 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, H., Ding, C., Luo, D., & Li, T. (2008). Simultaneous tensor subspace selection and clustering: The equivalence of high order svd and k-means clustering. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 327335). New York: ACM.

    • Search Google Scholar
    • Export Citation
  • Jain, AK, Dubes, RC 1988 Algorithms for clustering data Prentice Hall New York.

  • Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.

    • Search Google Scholar
    • Export Citation
  • Janssens, F, Zhang, L, De Moor, B, Glänzel, W 2009 Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management 45 6 683702 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Joachims, T., Cristianini, N., & Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning (pp. 250257). San Francisco, CA: Morgan Kaufmann Publishers Inc.

    • Search Google Scholar
    • Export Citation
  • Kolda, T. G., & Bader, W. B. (2006). The TOPHITS model for higher-order web link analysis. In Proceedings of the SIAM Data Mining Conference Workshop on Link Analysis, Counterterrorism and Security.

    • Search Google Scholar
    • Export Citation
  • Kolda, TG, Bader, BW 2009 Tensor decompositions and applications. SIAM Review 51 3 455500 .

  • Lay, DC 2003 Linear algebra and Its applications 3 Addition Wesley Boston.

  • Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., & Janssens, F. (2009). Hybrid clustering of text mining and bibliometrics applied to journal sets. In Proceedings of the SIAM International Conference on Data Mining. Philadelphia, PA: SIAM.

    • Search Google Scholar
    • Export Citation
  • Liu, X, Yu, S, Janssens, F, Glänzel, W, Moreau, Y, De Moor, B 2010 Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. Journal of the American Society for Information Science and Technology 61 6 11051119.

    • Search Google Scholar
    • Export Citation
  • Luxburg, U 2007 A tutorial on spectral clustering. Statistics and Computing 17 4 395416 .

  • Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proceedings of the 7th ACM on Hypertext and Hypermedia (pp. 143152). New York: ACM Press.

    • Search Google Scholar
    • Export Citation
  • Newman, MEJ 2006 Modularity and community structure in networks. PNAS 103 23 85778582 .

  • Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In T. Dietterich, S. Becker, & Z. Ghahramani (eds.), Advances in neural information processing systems (pp. 849856). Cambridge: MIT Press.

    • Search Google Scholar
    • Export Citation
  • Phan, A., & Cichocki, A. (2010). Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory and Its Applications, IEICE (in print).

    • Search Google Scholar
    • Export Citation
  • Rousseeuw, PJ 1987 Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics 20:5365 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Savas, B, Eldén, L 2007 Handwritten digit classification using higher order singular value decomposition. Pattern Recognition 40 3 9931003 .

  • Selee, T. M., Kolda, T. G., Kegelmeyer, W. P., & Griffin, J. D. (2007). Extracting clusters from large datasets with multiple similarity measures using IMSCAND. In M. L. Parks & S. S. Collis (eds.), CSRI Summer Proceedings 2007 (pp. 87103). Technical Report SAND2007-7977. Albuquerque, NM and Livermore, CA: Sandia National Laboratories.

    • Search Google Scholar
    • Export Citation
  • Shi, J, Malik, J 2000 Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 8 888905 .

  • Small, H 1973 Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24 4 265269 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Smilde, A, Bro, R, Geladi, P 2004 Multi-way analysis: Applications in the chemical sciences Wiley West Sussex, England .

  • Strehl, A, Ghosh, J 2002 Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3:583617 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 374383) New York: ACM.

    • Search Google Scholar
    • Export Citation
  • Tang, W., Lu, Z., & Dhillon, I. S. (2009). Clustering with multiple graphs. In ICDM ’09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining (pp. 10161021). Washington, DC: IEEE Computer Society.

    • Search Google Scholar
    • Export Citation
  • Tucker, L. (1964). The extension of factor analysis to three-dimensional matrices. In H. Gulliksen & N. Frederiksen (eds.), Contributions to mathematical psychology (pp. 109127). New York: Holt, Rinehart & Winston.

    • Search Google Scholar
    • Export Citation
  • Tucker, L 1966 Some mathematical notes on three-mode factor analysis. Psychometrika 31:279311 .

  • Yu, S. (2009). Kernel-based data fusion for machine learning: Methods and applications in bioinformatics and text mining. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.

    • Search Google Scholar
    • Export Citation