View More View Less
  • 1 LORIA Vandoeuvre-lès-Nancy (France
  • | 2 URI/INIST-CNRS Vandoeuvre-lès-Nancy (France
Restricted access

Abstract

The information analysis process includes a cluster analysis or classification step associated with an expert validation of the results. In this paper, we propose new measures of Recall/Precision for estimating the quality of cluster analysis. These measures derive both from the Galois lattice theory and from the Information Retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimension of the data and those of the clusters. We present two experiments on the basis of the MultiSOM model, which is an extension of Kohonen's SOM model, as a cluster analysis method. Our first experiment on patent data shows how our measures can be used to compare viewpoint-oriented classification methods, such as MultiSOM, with global cluster analysis method, such as WebSOM. Our second experiment, which takes part in the EICSTES EEC project, is an original Webometrics experiment that combines content and links classification starting from a large non-homogeneous set of web pages. This experiment highlights the fact that break-even points between our different measures of Recall/Precision can be used to determine an optimal number of clusters for web data classification. The content of the clusters obtained when using different break-even points are compared for determining the quality of the resulting maps.

  • 1. Barbut, M. Monjardet, B. 1970 Ordre et Classification: Algèbre et Combinatoire Hachette Université Paris.

  • 2. IST-1999-20350.

  • 3. Kohonen, T. 1997 Self-Organizing Maps Springer Verlag Berlin.

  • 4. Kaski, S. Honkela, T. Lagus, K. Kohonen, T. 1998 WEBSOM-self organizing maps of document collections Neurocomputing 21 101117.

  • 5. Lamirel, J. C., Toussaint, Y., Francois, C., Polanco, X., Using artificial neural networks for mapping of science and technology: application to patents analysis, Davis, M., Wilson, C. S. (Eds), Proceedings of ISSI 2001, Sydney, Australia, July 2001, pp. 339353.

    • Search Google Scholar
    • Export Citation
  • 6. Lamirel, J. C., Toussaint, Y., Combining symbolic and numeric techniques for digital libraries contents classification and analysis, Proceedings of First DELOS Network of Excellence Workshop, Zurich, December 2000.

    • Search Google Scholar
    • Export Citation
  • 7. Lamirel, J. C., Application d'une approche symbolico-connexionniste pour la conception d'un système documentaire hautement interactif, Thèse de l'Université de Nancy 1 Henri Poincaré, 1995.

    • Search Google Scholar
    • Export Citation
  • 8. Lebart, L. Morineau, A. FÉnelon, J. P. 1982 Traitement des données statistiques Dunod Paris, France.

  • 9. Lelu, A., Georgel, A., Neural models for orthogonal and oblique factor analysis: Towards dynamic data analysis of large sets of highly multidimensional objects, Proceedings of IJCNN, Paris, France, 1990, pp. 829832.

    • Search Google Scholar
    • Export Citation
  • 10. Mather, L. A. 2000 A linear algebra measure of cluster quality Journal of the American Society for Information Science 51 7 602613.

    • Search Google Scholar
    • Export Citation
  • 11. Ould Mahamed Yahya, M. A. 1997 Comparaison de méthodes neuronales avec des méthodes d'analyse des données dans le cadre d'ingénierie de l'information Mémoire de stage de D.E.S.S. en “Ingénierie mathématique et outils informatiques” Centre Elie Cartan, Université de Nancy I France.

    • Search Google Scholar
    • Export Citation
  • 12. Polanco, X. Lamirel, J. C. Francois, C. 2001 Using artificial neural networks for mapping of science and technology: A multi self-organizing maps approach Scientometrics 51 1 267292.

    • Search Google Scholar
    • Export Citation
  • 13. Robertson, S. E. Sparck-Jones, K. 1976 Relevance weighting of search terms Journal of the American Society for Information Science 27 129146.

    • Search Google Scholar
    • Export Citation
  • 14. Rham, C. 1980 La classification hiérarchique ascendante selon la méthode des voisins réciproques Les cahiers de l'analyse de données 5 2 135144.

    • Search Google Scholar
    • Export Citation
  • 15. Salton, G. 1971 The SMART Retrieval System: Experiments in Automatic Document Processing Prentice Hall Inc. Englewood Cliffs, New Jersey.

    • Search Google Scholar
    • Export Citation
  • 16. SOM papers, http://www.cis.hut.fi/nnrc/refs/.

  • 17. Van Rijsbergen, C. J. 1975 Information Retrieval Butterworths London, England.

Manuscript submission: http://www.editorialmanager.com/scim/

  • Impact Factor (2019): 2.867
  • Scimago Journal Rank (2019): 1.210
  • SJR Hirsch-Index (2019): 106
  • SJR Quartile Score (2019): Q1 Computer Science Apllications
  • SJR Quartile Score (2019): Q1 Library and Information Sciences
  • SJR Quartile Score (2019): Q1 Social Sciences (miscellaneous)
  • Impact Factor (2018): 2.770
  • Scimago Journal Rank (2018): 1.113
  • SJR Hirsch-Index (2018): 95
  • SJR Quartile Score (2018): Q1 Library and Information Sciences
  • SJR Quartile Score (2018): Q1 Social Sciences (miscellaneous)

For subscription options, please visit the website of Springer

Scientometrics
Language English
Size B5
Year of
Foundation
1978
Volumes
per Year
4
Issues
per Year
12
Founder Akadémiai Kiadó
Founder's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
Publisher Akadémiai Kiadó
Springer Nature Switzerland AG
Publisher's
Address
H-1117 Budapest, Hungary 1516 Budapest, PO Box 245.
CH-6330 Cham, Switzerland Gewerbestrasse 11.
Responsible
Publisher
Chief Executive Officer, Akadémiai Kiadó
ISSN 0138-9130 (Print)
ISSN 1588-2861 (Online)