With the modern technology fast developing, most of entities can be observed by different perspectives. These multiple view information allows us to find a better pattern as long as we integrate them in an appropriate way. So clustering by integrating multi-view representations that describe the same class of entities has become a crucial issue for knowledge discovering. We integrate multi-view data by a tensor model and present a hybrid clustering method based on Tucker-2 model, which can be regarded as an extension of spectral clustering. We apply our hybrid clustering method to scientific publication analysis by integrating citation-link and lexical content. Clustering experiments are conducted on a large-scale journal set retrieved from the Web of Science (WoS) database. Several relevant hybrid clustering methods are cross compared with our method. The analysis of clustering results demonstrate the effectiveness of the proposed algorithm. Furthermore, we provide a cognitive analysis of the clustering results as well as the visualization as a mapping of the journal set.
Arthur, D., & Vassilvitskii, S. (2006). k-means++: The advantages of careful seeding. Technical Report 2006-13, Stanford InfoLab.
Batagelj, V, Mrvar, A 2003 Pajek—analysis and visualization of large networks. Graph Drawing Software 2265:77–103.
Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the Fourth IEEE International Conference on Data Mining (pp. 19–26). IEEE Computer Society, Washington, DC, USA.
Boyack, KW, Klavans, R 2010 Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately?. Journal of the American Society for Information Science and Technology 61 12 2389–2404 .
Braam, RR, Moed, HF, van Raan, AFJ 1991 Mapping of science by combined co-citation and word analysis, part i: Structural aspects. Journal of the American Society for Information Science 42 4 233–251 .
Braam, RR, Moed, HF, van Raan, AFJ 1991 Mapping of science by combined co-citation and word analysis, part ii: Dynamical aspects. Journal of the American Society for Information Science 42 4 252–266 .
Callon, M, Courtial, JP, Turner, WA, Bauin, S 1983 From translations to problematic networks: An introduction to co-word analysis. Social Science Information 22 2 191–235 .
Comon, P 1994 Independent component analysis, a new concept?. Signal Processing 36 3 287–314 .
De Lathauwer, L, De Moor, B, Vandewalle, J 2000 A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 21 4 1253–1278 .
De Lathauwer, L, De Moor, B, Vandewalle, J 2000 On the best rank-1 and rank approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications 21 4 1324–1342 .
De Lathauwer, L, Vandewalle, J 2004 Dimensionality reduction in higher-order signal processing and rank- reduction in multilinear algebra. Linear Algebra and its Applications 391:31–55 .
Ding, C., Huang, H., & Luo, D. (2008). Tensor reduction error analysis applications to video compression and classification. In Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–8). Washington, DC: IEEE Computer Society.
Dunlavy, D. M., Kolda, T. G., & Kegelmeyer, W. P. (2006). Multilinear algebra for analyzing data with multiple linkages. Tech. Rep. SAND2006-2079, Sandia National Laboratories.
Glenisson, P, Glänzel, W, Janssens, F, De Moor, B 2005 Combining full text and bibliometric information in mapping scientific disciplines. Information Processing Management 41 6 1548–1572 .
He, X, Zha, H, Ding, C, Simon, H 2002 Web document clustering using hyperlink structures. Computational Statistics and Data Analysis 41 1 19–45 .
Huang, H., Ding, C., Luo, D., & Li, T. (2008). Simultaneous tensor subspace selection and clustering: The equivalence of high order svd and k-means clustering. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 327–335). New York: ACM.
Jain, AK, Dubes, RC 1988 Algorithms for clustering data Prentice Hall New York.
Janssens, F. (2007). Clustering of scientific fields by integrating text mining and bibliometrics. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.
Janssens, F, Zhang, L, De Moor, B, Glänzel, W 2009 Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing and Management 45 6 683–702 .
Joachims, T., Cristianini, N., & Shawe-Taylor, J. (2001). Composite kernels for hypertext categorisation. In ICML ’01: Proceedings of the Eighteenth International Conference on Machine Learning (pp. 250–257). San Francisco, CA: Morgan Kaufmann Publishers Inc.
Kolda, T. G., & Bader, W. B. (2006). The TOPHITS model for higher-order web link analysis. In Proceedings of the SIAM Data Mining Conference Workshop on Link Analysis, Counterterrorism and Security.
Kolda, TG, Bader, BW 2009 Tensor decompositions and applications. SIAM Review 51 3 455–500 .
Lay, DC 2003 Linear algebra and Its applications 3 Addition Wesley Boston.
Liu, X., Yu, S., Moreau, Y., De Moor, B., Glänzel, W., & Janssens, F. (2009). Hybrid clustering of text mining and bibliometrics applied to journal sets. In Proceedings of the SIAM International Conference on Data Mining. Philadelphia, PA: SIAM.
Liu, X, Yu, S, Janssens, F, Glänzel, W, Moreau, Y, De Moor, B 2010 Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database. Journal of the American Society for Information Science and Technology 61 6 1105–1119.
Luxburg, U 2007 A tutorial on spectral clustering. Statistics and Computing 17 4 395–416 .
Modha, D. S., & Spangler, W. S. (2000). Clustering hypertext with applications to web searching. In Proceedings of the 7th ACM on Hypertext and Hypermedia (pp. 143–152). New York: ACM Press.
Newman, MEJ 2006 Modularity and community structure in networks. PNAS 103 23 8577–8582 .
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In T. Dietterich, S. Becker, & Z. Ghahramani (eds.), Advances in neural information processing systems (pp. 849–856). Cambridge: MIT Press.
Phan, A., & Cichocki, A. (2010). Tensor decompositions for feature extraction and classification of high dimensional datasets. Nonlinear Theory and Its Applications, IEICE (in print).
Rousseeuw, PJ 1987 Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Computational and Applied Mathematics 20:53–65 .
Savas, B, Eldén, L 2007 Handwritten digit classification using higher order singular value decomposition. Pattern Recognition 40 3 993–1003 .
Selee, T. M., Kolda, T. G., Kegelmeyer, W. P., & Griffin, J. D. (2007). Extracting clusters from large datasets with multiple similarity measures using IMSCAND. In M. L. Parks & S. S. Collis (eds.), CSRI Summer Proceedings 2007 (pp. 87–103). Technical Report SAND2007-7977. Albuquerque, NM and Livermore, CA: Sandia National Laboratories.
Shi, J, Malik, J 2000 Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 8 888–905 .
Small, H 1973 Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science 24 4 265–269 .
Smilde, A, Bro, R, Geladi, P 2004 Multi-way analysis: Applications in the chemical sciences Wiley West Sussex, England .
Strehl, A, Ghosh, J 2002 Cluster ensembles-a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3:583–617 .
Sun, J., Tao, D., & Faloutsos, C. (2006). Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 374–383) New York: ACM.
Tang, W., Lu, Z., & Dhillon, I. S. (2009). Clustering with multiple graphs. In ICDM ’09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining (pp. 1016–1021). Washington, DC: IEEE Computer Society.
Tucker, L. (1964). The extension of factor analysis to three-dimensional matrices. In H. Gulliksen & N. Frederiksen (eds.), Contributions to mathematical psychology (pp. 109–127). New York: Holt, Rinehart & Winston.
Tucker, L 1966 Some mathematical notes on three-mode factor analysis. Psychometrika 31:279–311 .
Yu, S. (2009). Kernel-based data fusion for machine learning: Methods and applications in bioinformatics and text mining. PhD thesis, Faculty of Engineering, K.U. Leuven, Leuven, Belgium.