View More View Less
  • 1 Computer Science Department, IRIT UMR 5505 CNRS, University of Toulouse, 118 route de Narbonne, 31062, Toulouse Cedex 9, France
Restricted access

Abstract

Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers’ perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.

  • Adomavicius, G., Tuzhilin, A. 2005 Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans on Knowl and Data Eng 17 6 734749 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Agarwal, N., Haque, E., Liu, H., & Parsons, L. (2005). Research paper recommender systems: A subspace clustering approach. In W. Fan, Z. Wu, & J. Yang (eds.), WAIM'05: Proceedings of the 6th international conference on web-age information management. LNCS (Vol. 3739, pp. 475491). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Alonso, O., Rose, D. E., Stewart, B. 2008 Crowdsourcing for relevance evaluation. SIGIR Forum 42 2 915 .

  • Balabanović, M., Shoham, Y. 1997 Fab: Content-based, collaborative recommendation. Commun ACM 40 3 6672 .

  • Belkin, N. J., Croft, W. B. 1992 Information filtering and information retrieval: Two sides of the same coin?. Commun ACM 35 12 2938 .

  • Ben Jabeur, L., Tamine, L., & Boughanem, M. (2010). A social model for Literature Access: Towards a weighted social network of authors. In RIAO'10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.

    • Search Google Scholar
    • Export Citation
  • Biryukov, M. (2008). Co-author network analysis in DBLP: Classifying personal names. In MCO'08: Proceedings of the 2nd international conference on modelling, computation and optimization in information systems and management sciences. Communications in computer and information science (Vol. 14, pp. 399408). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bogers, T., & van den Bosch, A. (2008). Recommending scientific articles using CiteULike. In RecSys'08: Proceedings of the 4th ACM conference on recommender systems, ACM, New York, NY, USA (pp. 287290). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR'00: Proceedings of the 23rd international ACM SIGIR conference, ACM, New York, NY, USA (pp. 3340). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Buckley, C., & Voorhees, E. M. (2005). Retrieval system evaluation. In E. M. Voorhees & D. K. Harman (eds.), TREC: Experiment and evaluation in information retrieval (Chap. 3, pp. 5375). Cambridge, MA: MIT Press.

    • Search Google Scholar
    • Export Citation
  • Cazella, S. C., & Campos Alvares, L. O. (2005). Modeling user's opinion relevance to recommending research papers. In UM'05: Proceedings of the 10th international conference on user modeling. LNCS (Vol. 3538, pp. 327331). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cleverdon, C. W. (1962). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. ASLIB Cranfield Research Project, Cranfield, UK.

    • Search Google Scholar
    • Export Citation
  • Deng, H., King, I., & Lyu, M. R. (2008). Formal models for expert finding on DBLP bibliography data. In ICDM'08: Proceedings of the 8th IEEE international conference on data mining (pp. 163172). Washington, DC: IEEE Computer Society. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dolamic, L., Savoy, J. 2010 When stopword lists make the difference. J Am Soc Inf Sci Technol 61 1 200203 .

  • Easley, D., Kleinberg, J. 2010 Networks, Crowds, and Markets: Reasoning About a Highly Connected World Cambridge University Press New York.

    • Search Google Scholar
    • Export Citation
  • Elmacioglu, E., Lee, D. 2005 On Six Degrees of Separation in DBLP-DB and More. SIGMOD Rec 34 2 3340 .

  • Fox, C. 1989 A stop list for general text. SIGIR Forum 24 1-2 1921 .

  • Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. K. Harman (ed.), TREC-1: Proceedings of the first text retrieval conference, NIST, Gaithersburg, MD, USA (pp. 243252).

    • Search Google Scholar
    • Export Citation
  • Garfield, E. 1955 Citation indexes for science: A new dimension in documentation through association of ideas. Science 122 3159 108111 .

  • Garfield, E. (1996). What is the primordial reference for the phrase ‘Publish or perish’? The Scientist, 10(12), 11. http://www.the-scientist.com/article/display/17052.

    • Search Google Scholar
    • Export Citation
  • Garfield, E. 2006 The history and meaning of the Journal Impact Factor. J Am Med Assoc 295 1 9093 .

  • Glenisson, P., Glänzel, W., Janssens, F., Moor, B. D. 2005 Combining full text and bibliometric information in mapping scientific disciplines. Inf Process Manage 41 6 15481572 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glenisson, P., Glänzel, W., Persson, O. 2005 Combining full-text analysis and bibliometric indicators. a pilot study. Scientometr 63 1 163180 .

  • Goldberg, D., Nichols, D., Oki, B. M., Terry, D. B. 1992 Using collaborative filtering to weave an Information Tapestry. Commun ACM 35 12 6170 .

  • Gori, M., & Pucci, A. (2006). Research paper recommender systems: A random-walk based approach. In WI'06: Proceedings of the 5th IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, Los Alamitos, CA, USA (pp. 778781). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. 2004 Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22 1 553 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hirsch, J. E. 2005 An index to quantify an individual's scientific research output. Proc Natl Acad Sci USA 102 46 16,56916,572 .

  • Hirsch, J. E. 2010 An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship. Scientometr 85 3 741754 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In N. Memon & R. Alhajj (eds.), ASONAM'09: Proceedings of the 1st international conference on advances in social network analysis and mining, IEEE Computer Society (pp. 219224). doi: ASONAM.2009.6.

    • Search Google Scholar
    • Export Citation
  • Hubert, G., Mothe, J. 2009 An adaptable search engine for multimodal information retrieval. J Am Soc Inf Sci Technol 60 8 16251634 .

  • Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR'93: Proceedings of the 16th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 329338). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hurtado Martín, G., Cornelis, C., & Naessens, H. (2009). Training a personal alert system for research information recommendation. In J. P. Carvalho, D. Dubois, U. Kaymak, & J. M. C. Sousa (eds.), IFSA/EUSFLAT'09: Proceedings of the joint 2009 international Fuzzy systems association world congress and 2009 European Society of fuzzy logic and technology conference (pp. 408413).

    • Search Google Scholar
    • Export Citation
  • Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2010). Metadata impact on research paper similarity. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (eds.), ECDL'10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 457460). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janas, J. M. 1977 Automatic recognition of the part-of-speech for english texts. Inf Process Manage 13 4 205213 .

  • Järvelin, K., Kekäläinen, J. 2002 Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20 4 422446 .

  • Karoui, H., Kanawati, R., & Petrucci, L. (2006). COBRAS: Cooperative CBR system for bibliographical reference recommendation. In ECCBR'06: Proceedings of the 8th European conference on advances in case-based reasoning. LNCS (Vol. 4106, pp. 7690). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kelly, D. 2009 Methods for evaluating interactive information retrieval systems with users. Found Trends Inf Retr 3 1–2 1224 .

  • Klas, C. P., & Fuhr, N. (2000). A new effective approach for categorizing web documents. In Proceedings of the 22th BCS-IRSG colloquium on IR research.

    • Search Google Scholar
    • Export Citation
  • Lee, J. H. (1997). Analyses of multiple evidence combination. In SIGIR'97: Proceedings of the 20th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 267276). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ley, M. (2002). The DBLP computer science bibliography: Evolution, research issues, perspectives. In A. H. F. Laender & A. L. Oliveira (eds.), SPIRE'02: Proceedings of the 9th international conference on string processing and information retrieval. LNCS (Vol. 2476, pp. 110). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Likert, R. 1932 A technique for the measurement of attitudes. Arch Psychol 22 140 554.

  • Manning, C. D., Raghavan, P., Schütze, H. 2008 Introduction to Information Retrieval Cambridge University Press Cambridge.

  • McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In CSCW'02: Proceedings of the 2002 ACM conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 116125). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don't look stupid: Avoiding pitfalls when recommending research papers. In CSCW ′06: Proceedings of the 2006 20th anniversary conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 171180). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Micarelli, A., Sciarrone, F., & Marinilli, M. (2007). In Web document modeling. LNCS (Vol. 4321, pp. 155192). New York: Springer. doi: .

  • Milgram, S. 1967 The small-world problem. Psychology Today 1 1 6167.

  • Mimno, D., & McCallum, A. (2007). Mining a digital library for influential authors. In JCDL'07: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries, ACM, New York, NY, USA (pp. 105106). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mittelbach, F., Goossens, M. 2005 Companion 2 Pearson Education Boston, MA.

  • Montaner, M., López, B. J. L. de la Rosa 2003 A Taxonomy of Recommender Agents on the Internet. Artif Intell Rev 19 4 285330 .

  • Naak, A., Hage, H., & Aïmeur, E. (2009). A multi-criteria collaborative filtering approach for research paper recommendation in papyres. In MCETECH'09: Proceedings of the 4th international conference on E-technologies: Innovation in an open world. LNBIP (Vol. 26, pp. 2539). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Porcel, C., López-Herrera, A. G., Herrera-Viedma, E. 2009 A recommender system for research resources based on fuzzy linguistic modeling. Expert Syst Appl 36 3 51735183 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Porter, M. F. 1980 An algorithm for suffix stripping. Program 14 3 130137.

  • Powley, B., & Dale, R. (2007). Evidence-based information extraction for high accuracy citation and author name identification. In RIAO'07: Proceedings of the 8th conference on information retrieval and its applications. CID, CDROM.

    • Search Google Scholar
    • Export Citation
  • Reips, U. D. 2002 Standards for Internet-Based Experimenting. Exp Psychol 49 4 243256 .

  • Reips, U. D. (2007). The methodology of Internet-based experiments. In A. N. Joinson, K. Y. A. McKenna, T. Postmes, & U. D. Reips (eds.), The Oxford handbook of Internet psychology. New York: Oxford University Press (Chap. 24, pp. 373390).

    • Search Google Scholar
    • Export Citation
  • Reips, U. D., Lengler, R. 2005 The Web Experiment List: A Web service for the recruitment of participants and archiving of Internet-based experiments. Behav Res Meth 37 2 287292 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (eds.), ECDL'10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 216227). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Resnick, P., Varian, H. R. 1997 Recommender systems. Commun ACM 40 3 5658 .

  • Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M. 2010 Learning author-topic models from text corpora. ACM Trans Inf Syst 28 1 41438 .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In UAI'04: Proceedings of the 20th annual conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia (pp. 487494).

    • Search Google Scholar
    • Export Citation
  • Salton, G., Buckley, C. 1988 Term-weighting approaches in automatic text retrieval. Inf Process Manage 24 5 513523 .

  • Salton, G., Wong, A., Yang, C. S. 1975 A vector space model for automatic indexing. Commun ACM 18 11 613620 .

  • Sanderson, M. 2010 Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4 4 247375 .

  • Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In SIGIR'05: Proceedings of the 28th annual international ACM SIGIR conference, ACM, New York, NY, USA (pp. 162169). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Spärck, Jones K. 1973 Index term weighting. Inform Stor Retr 9 11 619633 .

  • Spärck, Jones K. 1974 Automatic indexing. J Doc 30 4 393432 .

  • Student. (1908). The probable error of a mean. Biometrika, 6 (1), 125. doi: .

  • Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In KDD'08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA (pp. 990998). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Travers, J., Milgram, S. 1969 An experimental study of the small world problem. Sociometry 32 4 425443 .

  • Tsatsaronis, G., Varlamis, I., Stamou, S., N⊘rvåg, K., & Vazirgiannis, M. (2009). Semantic relatedness hits bibliographic data. In WIDM'09: Proceeding of the 11th international workshop on Web information and data management, ACM, New York, NY, USA (pp. 8790). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (eds.), CLEF'01: Second workshop of the cross-language evaluation forum. LNCS (Vol. 2406, pp. 355370). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Voorhees, E. M., Harman, D. K. 2005 TREC: Experiment and Evaluation in Information Retrieval MIT Press Cambridge, MA, USA.

  • Wilcoxon, F. 1945 Individual comparisons by ranking methods. Biom Bull 1 6 8083 .

  • Yan, E., Ding, Y. 2009 Applying centrality measures to impact analysis: A coauthorship network analysis. J Am Soc Inf Sci Technol 60 10 21072118 .

  • Yang, Z., Hong, L., & Davison, B. D. (2010). Topic-driven multi-type citation network analysis. In RIAO'10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.

    • Search Google Scholar
    • Export Citation
  • Zamparelli, R. (1998). Internet publications: Pay-per-use or pay-per-subscription? In C. Nikolaou & C. Stephanidis (eds.), ECDL'98: Proceedings of the 2nd European conference on research and advanced technology for digital libraries. LNCS (Vol. 1513, pp. 635636). New York: Springer. doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In ICDM'07: Proceedings of the 7th IEEE international conference on data mining (pp. 739744). doi: .

    • Crossref
    • Search Google Scholar
    • Export Citation