View More View Less
  • 1 Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Milan, Italy; alfio.ferrara@unimi.it
  • 2 Dipartimento di Scienze Economiche, Aziendali e Statistiche, Università degli Studi di Milano, Milan, Italy
Restricted access

Abstract

The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.

  • 1.

    Agrawal, R., Gupta, A., Sarawagi, S. (1997). Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE '97, (pp. 232243). Washington, DC, USA: IEEE Computer Society. http://portal.acm.org/citation.cfm?id=645482.653299.

    • Search Google Scholar
    • Export Citation
  • 2.

    Bakkalbasi, N, Bauer, K, Glover, J, Wang, L. Three options for citation tracking: Google scholar, scopus and web of science. Biomedical digital libraries 2006 3 1 7 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 3.

    Benito, M, Romera, R. Improving quality assessment of composite indicators in university rankings: A case study of french and german universities of excellence. Scientometrics 2011 89:153176 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 4.

    Blei, D., Lafferty, J. (2006). Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning (pp. 113120). New York: ACM.

    • Search Google Scholar
    • Export Citation
  • 5.

    Blei, D, Lafferty, J. A correlated topic model of science. The Annals of Applied Statistics 2007 1 1 1735 .

  • 6.

    Blei, D, Lafferty, J. Topic models. Text mining: classification, clustering, and applications 2009 10:71 .

  • 7.

    Blei, D, Ng, A, Jordan, M. Latent Dirichlet allocation. The Journal of Machine Learning Research 2003 3:9931022.

  • 8.

    Borg, I, Groenen, P 2005 Modern multidimensional scaling: Theory and applications Springer Verlag Berlin.

  • 9.

    Brockwell, P, Davis, R. 2002 Introduction to time series and forecasting Springer Verlag Berlin .

  • 10.

    Bryk, A, Raudenbush, S 1992 Hierarchical linear models: Applications and data analysis methods Sage Publications, Inc New York.

  • 11.

    Castano, S., Ferrara, A., Lorusso, D., Montanelli, S. (2008). On the Ontology Instance Matching Problem. In: Proceedings of the 7th DEXA Workshop on Web Semantics (WebS 08) (pp. 180184). Turin, Italy.

    • Search Google Scholar
    • Export Citation
  • 12.

    Coates, H. Universities on the catwalk: Models for performance ranking in australia. Higher Education Management and Policy 2007 19 2 69 .

  • 13.

    Codd, E., Codd, S., Salley, C. (1993). Providing olap to user-analysts: An it mandate. Tech. rep.

  • 14.

    DeBattisti, F., Salini, S. (2010). Bibliometric indicators for statisticians: critical assessment in the Italian context. Università di Firenze, Firenze. http://air.unimi.it/handle/2434/152106.

    • Search Google Scholar
    • Export Citation
  • 15.

    Deerwester, S, Dumais, S, Furnas, G, Landauer, T, Harshman, R. Indexing by latent semantic analysis. Journal of the American society for information science 1990 41 6 391407 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 16.

    Falagas, M, Pitsouni, E, Malietzis, G, Pappas, G. Comparison of pubmed, scopus, web of science, and Google scholar: strengths and weaknesses. The FASEB Journal 2008 22 2 338 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 17.

    Franceschet, M. A cluster analysis of scholar and journal bibliometric indicators. Journal of the American Society for Information Science and Technology 2009 60 10 19501964 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 18.

    Friedman, J, Tibshirani, R, Hastie, T 2009 The elements of statistical learning: Data mining, inference, and prediction Springer-Verlag New York.

    • Search Google Scholar
    • Export Citation
  • 19.

    Geraci, M, Degli Esposti, M. Where do italian universities stand? An in-depth statistical analysis of national and international rankings. Scientometrics 2011 87 3 667681 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 20.

    Glänzel, W, Schubert, A. A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics 2003 56 3 357367 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 21.

    Goldstein, H 2010 Multilevel statistical models, 4th edn Wiley New York .

  • 22.

    Goldstein, H., Spiegelhalter, D. (1996) League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society), 385443.

    • Search Google Scholar
    • Export Citation
  • 23.

    Golfarelli, M, Rizzi, S 2009 Data Warehouse design: Modern principles and methodologies McGraw-Hill Maidenheach.

  • 24.

    Greenacre, M, Blasius, J 2006 Multiple correspondence analysis and related methods Chapman & Hall/CRC Boca Raton .

  • 25.

    Hirsch, J. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United states of America 2005 102 46 16,569 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 26.

    Hofmann, T. (1999). Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 5057). New york: ACM.

    • Search Google Scholar
    • Export Citation
  • 27.

    Hubert, J. Bibliometric models for journal productivity. Social Indicators Research 1977 4 1 441473 .

  • 28.

    Hudomalj, E, Vidmar, G. Olap and bibliographic databases. Scientometrics 2003 58 3 609622 .

  • 29.

    Irvine, J., Martin, B. (1984). Foresight in science: picking the winners. London.

  • 30.

    Jensen, F 1996 An introduction to Bayesian networks, vol. 210 UCL press London.

  • 31.

    Kenett, R, Salini, S. Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis. Applied Stochastic Models in Business and Industry 2011 27 5 465475 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 32.

    Kolaczyk, E 2009 Statistical analysis of network data: methods and models Springer Verlag Berlin .

  • 33.

    Mallig, N. A relational database for bibliometric analysis. Journal of Informetrics 2010 4 4 564580 .

  • 34.

    Mann, G., Mimno, D., McCallum, A. (2006). Bibliometric impact measures leveraging topic analysis. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (pp. 6574). New york: ACM.

    • Search Google Scholar
    • Export Citation
  • 35.

    Meho, L, Yang, K. Impact of data sources on citation counts and rankings of lis faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology 2007 58 13 21052125 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 36.

    Molinari, J, Molinari, A. A new methodology for ranking scientific institutions. Scientometrics 2008 75 1 163174 .

  • 37.

    Nigam, K, McCallum, A, Thrun, S, Mitchell, T. Text classification from labeled and unlabeled documents using em. Machine learning 2000 39 2 103134 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 38.

    Steyvers, M, Griffiths, T. Probabilistic topic models. Handbook of latent semantic analysis 2007 427 7 424440.

  • 39.

    Tapper, T, Filippakou, O. The world-class league tables and the sustaining of international reputations in higher education. Journal of Higher Education Policy and Management 2009 31 1 5566 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • 40.

    Teh, Y, Jordan, M, Beal, M, Blei, D. Hierarchical dirichlet processes. Journal of the American Statistical Association 2006 101 476 15661581 .

  • 41.

    Vassiliadis, P. (1998). Modeling multidimensional databases, cubes and cube operations. In: Scientific and Statistical Database Management, International Conference on, (p. 53). IEEE Computer Society, Los Alamitos, CA, USA. http://doi.ieeecomputersociety.org/10.1109/SSDM.1998.688111.

    • Search Google Scholar
    • Export Citation
  • 42.

    Vassiliadis, P., Sellis, T. (1999). A survey of logical models for olap databases. SIGMOD Rec. 28, 6469. http://doi.acm.org/10.1145/344816.344869. http://doi.acm.org/10.1145/344816.344869.

    • Search Google Scholar
    • Export Citation
  • 43.

    Vinkler, P 2010 The evaluation of research by scientometric indicators Chandos Publishing London .

  • 44.

    Wolfram, D. Applications of SQL for informetric frequency distribution processing. Scientometrics 2006 67 2 301313 .

  • 45.

    Yu, H, Davis, M, Wilson, C, Cole, F. Object-relational data modelling for informetric databases. Journal of Informetrics 2008 2 3 240251 .