Scientific literature recommender systems (SLRSs) provide papers to researchers according to their scientific interests. Systems rely on inter-researcher similarity measures that are usually computed according to publication contents (i.e., by extracting paper topics and citations). We highlight two major issues related to this design. The required full-text access and processing are expensive and hardly feasible. Moreover, clues about meetings, encounters, and informal exchanges between researchers (which are related to a social dimension) were not exploited to date. In order to tackle these issues, we propose an original SLRS based on a threefold contribution. First, we argue the case for defining inter-researcher similarity measures building on publicly available metadata. Second, we define topical and social measures that we combine together to issue socio-topical recommendations. Third, we conduct an evaluation with 71 volunteer researchers to check researchers’ perception against socio-topical similarities. Experimental results show a significant 11.21% accuracy improvement of socio-topical recommendations compared to baseline topical recommendations.
Adomavicius, G., Tuzhilin, A. 2005 Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans on Knowl and Data Eng 17 6 734–749 .
Agarwal, N., Haque, E., Liu, H., & Parsons, L. (2005). Research paper recommender systems: A subspace clustering approach. In W. Fan, Z. Wu, & J. Yang (eds.), WAIM'05: Proceedings of the 6th international conference on web-age information management. LNCS (Vol. 3739, pp. 475–491). New York: Springer. doi: .
Alonso, O., Rose, D. E., Stewart, B. 2008 Crowdsourcing for relevance evaluation. SIGIR Forum 42 2 9–15 .
Balabanović, M., Shoham, Y. 1997 Fab: Content-based, collaborative recommendation. Commun ACM 40 3 66–72 .
Belkin, N. J., Croft, W. B. 1992 Information filtering and information retrieval: Two sides of the same coin?. Commun ACM 35 12 29–38 .
Ben Jabeur, L., Tamine, L., & Boughanem, M. (2010). A social model for Literature Access: Towards a weighted social network of authors. In RIAO'10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.
Biryukov, M. (2008). Co-author network analysis in DBLP: Classifying personal names. In MCO'08: Proceedings of the 2nd international conference on modelling, computation and optimization in information systems and management sciences. Communications in computer and information science (Vol. 14, pp. 399–408). New York: Springer. doi: .
Bogers, T., & van den Bosch, A. (2008). Recommending scientific articles using CiteULike. In RecSys'08: Proceedings of the 4th ACM conference on recommender systems, ACM, New York, NY, USA (pp. 287–290). doi: .
Buckley, C., & Voorhees, E. M. (2000). Evaluating evaluation measure stability. In SIGIR'00: Proceedings of the 23rd international ACM SIGIR conference, ACM, New York, NY, USA (pp. 33–40). doi: .
Buckley, C., & Voorhees, E. M. (2005). Retrieval system evaluation. In E. M. Voorhees & D. K. Harman (eds.), TREC: Experiment and evaluation in information retrieval (Chap. 3, pp. 53–75). Cambridge, MA: MIT Press.
Cazella, S. C., & Campos Alvares, L. O. (2005). Modeling user's opinion relevance to recommending research papers. In UM'05: Proceedings of the 10th international conference on user modeling. LNCS (Vol. 3538, pp. 327–331). New York: Springer. doi: .
Cleverdon, C. W. (1962). Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. ASLIB Cranfield Research Project, Cranfield, UK.
Deng, H., King, I., & Lyu, M. R. (2008). Formal models for expert finding on DBLP bibliography data. In ICDM'08: Proceedings of the 8th IEEE international conference on data mining (pp. 163–172). Washington, DC: IEEE Computer Society. doi: .
Dolamic, L., Savoy, J. 2010 When stopword lists make the difference. J Am Soc Inf Sci Technol 61 1 200–203 .
Easley, D., Kleinberg, J. 2010 Networks, Crowds, and Markets: Reasoning About a Highly Connected World Cambridge University Press New York.
Elmacioglu, E., Lee, D. 2005 On Six Degrees of Separation in DBLP-DB and More. SIGMOD Rec 34 2 33–40 .
Fox, C. 1989 A stop list for general text. SIGIR Forum 24 1-2 19–21 .
Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. K. Harman (ed.), TREC-1: Proceedings of the first text retrieval conference, NIST, Gaithersburg, MD, USA (pp. 243–252).
Garfield, E. 1955 Citation indexes for science: A new dimension in documentation through association of ideas. Science 122 3159 108–111 .
Garfield, E. (1996). What is the primordial reference for the phrase ‘Publish or perish’? The Scientist, 10(12), 11. http://www.the-scientist.com/article/display/17052.
Garfield, E. 2006 The history and meaning of the Journal Impact Factor. J Am Med Assoc 295 1 90–93 .
Glenisson, P., Glänzel, W., Janssens, F., Moor, B. D. 2005 Combining full text and bibliometric information in mapping scientific disciplines. Inf Process Manage 41 6 1548–1572 .
Glenisson, P., Glänzel, W., Persson, O. 2005 Combining full-text analysis and bibliometric indicators. a pilot study. Scientometr 63 1 163–180 .
Goldberg, D., Nichols, D., Oki, B. M., Terry, D. B. 1992 Using collaborative filtering to weave an Information Tapestry. Commun ACM 35 12 61–70 .
Gori, M., & Pucci, A. (2006). Research paper recommender systems: A random-walk based approach. In WI'06: Proceedings of the 5th IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, Los Alamitos, CA, USA (pp. 778–781). doi: .
Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. 2004 Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22 1 5–53 .
Hirsch, J. E. 2005 An index to quantify an individual's scientific research output. Proc Natl Acad Sci USA 102 46 16,569–16,572 .
Hirsch, J. E. 2010 An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship. Scientometr 85 3 741–754 .
Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In N. Memon & R. Alhajj (eds.), ASONAM'09: Proceedings of the 1st international conference on advances in social network analysis and mining, IEEE Computer Society (pp. 219–224). doi: ASONAM.2009.6.
Hubert, G., Mothe, J. 2009 An adaptable search engine for multimodal information retrieval. J Am Soc Inf Sci Technol 60 8 1625–1634 .
Hull, D. (1993). Using statistical testing in the evaluation of retrieval experiments. In SIGIR'93: Proceedings of the 16th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 329–338). doi: .
Hurtado Martín, G., Cornelis, C., & Naessens, H. (2009). Training a personal alert system for research information recommendation. In J. P. Carvalho, D. Dubois, U. Kaymak, & J. M. C. Sousa (eds.), IFSA/EUSFLAT'09: Proceedings of the joint 2009 international Fuzzy systems association world congress and 2009 European Society of fuzzy logic and technology conference (pp. 408–413).
Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2010). Metadata impact on research paper similarity. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (eds.), ECDL'10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 457–460). New York: Springer. doi: .
Janas, J. M. 1977 Automatic recognition of the part-of-speech for english texts. Inf Process Manage 13 4 205–213 .
Järvelin, K., Kekäläinen, J. 2002 Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20 4 422–446 .
Karoui, H., Kanawati, R., & Petrucci, L. (2006). COBRAS: Cooperative CBR system for bibliographical reference recommendation. In ECCBR'06: Proceedings of the 8th European conference on advances in case-based reasoning. LNCS (Vol. 4106, pp. 76–90). New York: Springer. doi: .
Kelly, D. 2009 Methods for evaluating interactive information retrieval systems with users. Found Trends Inf Retr 3 1–2 1–224 .
Klas, C. P., & Fuhr, N. (2000). A new effective approach for categorizing web documents. In Proceedings of the 22th BCS-IRSG colloquium on IR research.
Lee, J. H. (1997). Analyses of multiple evidence combination. In SIGIR'97: Proceedings of the 20th annual international ACM SIGIR conference, ACM Press, New York, NY, USA (pp. 267–276). doi: .
Ley, M. (2002). The DBLP computer science bibliography: Evolution, research issues, perspectives. In A. H. F. Laender & A. L. Oliveira (eds.), SPIRE'02: Proceedings of the 9th international conference on string processing and information retrieval. LNCS (Vol. 2476, pp. 1–10). New York: Springer. doi: .
Likert, R. 1932 A technique for the measurement of attitudes. Arch Psychol 22 140 5–54.
Manning, C. D., Raghavan, P., Schütze, H. 2008 Introduction to Information Retrieval Cambridge University Press Cambridge.
McNee, S. M., Albert, I., Cosley, D., Gopalkrishnan, P., Lam, S. K., Rashid, A. M., Konstan, J. A., & Riedl, J. (2002). On the recommending of citations for research papers. In CSCW'02: Proceedings of the 2002 ACM conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 116–125). doi: .
McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don't look stupid: Avoiding pitfalls when recommending research papers. In CSCW ′06: Proceedings of the 2006 20th anniversary conference on computer supported cooperative work, ACM, New York, NY, USA (pp. 171–180). doi: .
Micarelli, A., Sciarrone, F., & Marinilli, M. (2007). In Web document modeling. LNCS (Vol. 4321, pp. 155–192). New York: Springer. doi: .
Milgram, S. 1967 The small-world problem. Psychology Today 1 1 61–67.
Mimno, D., & McCallum, A. (2007). Mining a digital library for influential authors. In JCDL'07: Proceedings of the 7th ACM/IEEE-CS joint conference on digital libraries, ACM, New York, NY, USA (pp. 105–106). doi: .
Mittelbach, F., Goossens, M. 2005 Companion 2 Pearson Education Boston, MA.
Montaner, M., López, B. J. L. de la Rosa 2003 A Taxonomy of Recommender Agents on the Internet. Artif Intell Rev 19 4 285–330 .
Naak, A., Hage, H., & Aïmeur, E. (2009). A multi-criteria collaborative filtering approach for research paper recommendation in papyres. In MCETECH'09: Proceedings of the 4th international conference on E-technologies: Innovation in an open world. LNBIP (Vol. 26, pp. 25–39). New York: Springer. doi: .
Porcel, C., López-Herrera, A. G., Herrera-Viedma, E. 2009 A recommender system for research resources based on fuzzy linguistic modeling. Expert Syst Appl 36 3 5173–5183 .
Porter, M. F. 1980 An algorithm for suffix stripping. Program 14 3 130–137.
Powley, B., & Dale, R. (2007). Evidence-based information extraction for high accuracy citation and author name identification. In RIAO'07: Proceedings of the 8th conference on information retrieval and its applications. CID, CDROM.
Reips, U. D. 2002 Standards for Internet-Based Experimenting. Exp Psychol 49 4 243–256 .
Reips, U. D. (2007). The methodology of Internet-based experiments. In A. N. Joinson, K. Y. A. McKenna, T. Postmes, & U. D. Reips (eds.), The Oxford handbook of Internet psychology. New York: Oxford University Press (Chap. 24, pp. 373–390).
Reips, U. D., Lengler, R. 2005 The Web Experiment List: A Web service for the recruitment of participants and archiving of Internet-based experiments. Behav Res Meth 37 2 287–292 .
Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (eds.), ECDL'10: Proceedings of the 14th European conference on research and advanced technology for digital libraries. LNCS (Vol. 6273, pp. 216–227). New York: Springer. doi: .
Resnick, P., Varian, H. R. 1997 Recommender systems. Commun ACM 40 3 56–58 .
Rosen-Zvi, M., Chemudugunta, C., Griffiths, T., Smyth, P., Steyvers, M. 2010 Learning author-topic models from text corpora. ACM Trans Inf Syst 28 1 4–1438 .
Rosen-Zvi, M., Griffiths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. In UAI'04: Proceedings of the 20th annual conference on uncertainty in artificial intelligence, AUAI Press, Arlington, Virginia (pp. 487–494).
Salton, G., Buckley, C. 1988 Term-weighting approaches in automatic text retrieval. Inf Process Manage 24 5 513–523 .
Salton, G., Wong, A., Yang, C. S. 1975 A vector space model for automatic indexing. Commun ACM 18 11 613–620 .
Sanderson, M. 2010 Test collection based evaluation of information retrieval systems. Found Trends Inf Retr 4 4 247–375 .
Sanderson, M., & Zobel, J. (2005). Information retrieval system evaluation: Effort, sensitivity, and reliability. In SIGIR'05: Proceedings of the 28th annual international ACM SIGIR conference, ACM, New York, NY, USA (pp. 162–169). doi: .
Spärck, Jones K. 1973 Index term weighting. Inform Stor Retr 9 11 619–633 .
Spärck, Jones K. 1974 Automatic indexing. J Doc 30 4 393–432 .
Student. (1908). The probable error of a mean. Biometrika, 6 (1), 1–25. doi: .
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In KDD'08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA (pp. 990–998). doi: .
Travers, J., Milgram, S. 1969 An experimental study of the small world problem. Sociometry 32 4 425–443 .
Tsatsaronis, G., Varlamis, I., Stamou, S., N⊘rvåg, K., & Vazirgiannis, M. (2009). Semantic relatedness hits bibliographic data. In WIDM'09: Proceeding of the 11th international workshop on Web information and data management, ACM, New York, NY, USA (pp. 87–90). doi: .
Voorhees, E. M. (2002). The philosophy of information retrieval evaluation. In C. Peters, M. Braschler, J. Gonzalo, & M. Kluck (eds.), CLEF'01: Second workshop of the cross-language evaluation forum. LNCS (Vol. 2406, pp. 355–370). New York: Springer. doi: .
Voorhees, E. M., Harman, D. K. 2005 TREC: Experiment and Evaluation in Information Retrieval MIT Press Cambridge, MA, USA.
Wilcoxon, F. 1945 Individual comparisons by ranking methods. Biom Bull 1 6 80–83 .
Yan, E., Ding, Y. 2009 Applying centrality measures to impact analysis: A coauthorship network analysis. J Am Soc Inf Sci Technol 60 10 2107–2118 .
Yang, Z., Hong, L., & Davison, B. D. (2010). Topic-driven multi-type citation network analysis. In RIAO'10: Proceedings of the 9th international conference on information retrieval and its applications. CDROM.
Zamparelli, R. (1998). Internet publications: Pay-per-use or pay-per-subscription? In C. Nikolaou & C. Stephanidis (eds.), ECDL'98: Proceedings of the 2nd European conference on research and advanced technology for digital libraries. LNCS (Vol. 1513, pp. 635–636). New York: Springer. doi: .
Zhou, D., Orshanskiy, S. A., Zha, H., & Giles, C. L. (2007). Co-ranking authors and documents in a heterogeneous network. In ICDM'07: Proceedings of the 7th IEEE international conference on data mining (pp. 739–744). doi: .