Authors:Gavin LaRowe, Sumeet Ambre, John Burgoon, Weimao Ke, and Katy Börner
The Scholarly Database aims to serve researchers and practitioners interested in the analysis, modelling, and visualization
of large-scale data sets. A specific focus of this database is to support macro-evolutionary studies of science and to communicate
findings via knowledge-domain visualizations. Currently, the database provides access to about 18 million publications, patents,
and grants. About 90% of the publications are available in full text. Except for some datasets with restricted access conditions,
the data can be retrieved in raw or pre-processed formats using either a web-based or a relational database client. This paper
motivates the need for the database from the perspective of bibliometric/scientometric research. It explains the database
design, setup, etc., and reports the temporal, geographical, and topic coverage of data sets currently served via the database.
Planned work and the potential for this database to become a global testbed for information science research are discussed
at the end of the paper.
Authors:Katy Börner, Weixia Huang, Micah Linnemeier, Russell Duhon, Patrick Phillips, Nianli Ma, Angela Zoss, Hanning Guo, and Mark Price
The enormous increase in digital scholarly data and computing power combined with recent advances in text mining, linguistics,
network science, and scientometrics make it possible to scientifically study the structure and evolution of science on a large
scale. This paper discusses the challenges of this ‘BIG science of science’—also called ‘computational scientometrics’ research—in
terms of data access, algorithm scalability, repeatability, as well as result communication and interpretation. It then introduces
two infrastructures: (1) the Scholarly Database (SDB) (http://sdb.slis.indiana.edu), which provides free online access to 22 million scholarly records—papers, patents, and funding awards which can be cross-searched
and downloaded as dumps, and (2) Scientometrics-relevant plug-ins of the open-source Network Workbench (NWB) Tool (http://nwb.slis.indiana.edu). The utility of these infrastructures is then exemplarily demonstrated in three studies: a comparison of the funding portfolios
and co-investigator networks of different universities, an examination of paper-citation and co-author networks of major network
science researchers, and an analysis of topic bursts in streams of text. The article concludes with a discussion of related
work that aims to provide practically useful and theoretically grounded cyberinfrastructure in support of computational scientometrics
research, education and practice.
The file-drawer problem is the tendency of journals to preferentially publish studies with statistically significant results.
The problem is an old one and has been documented in various fields, but to my best knowledge there has not been attention
to how the issue is developing in a quantitative way through time. In the abstracts of various major scholarly databases (Science
and Social Science Citation Index (1991–2008), CAB Abstracts and Medline (1970s–2008), the file drawer problem is gradually
getting worse, in spite of an increase in (1) the total number of publications and (2) the proportion of publications reporting
both the presence and the absence of significant differences. The trend is confirmed for particular natural science topics
such as biology, energy and environment but not for papers retrieved with the keywords biodiversity, chemistry, computer,
engineering, genetics, psychology and quantum (physics). A worsening file-drawer problem can be detected in various medical
fields (infection, immunology, malaria, obesity, oncology and pharmacology), but not for papers indexed with strings such
as AIDS/HIV, epidemiology, health and neurology. An increase in the selective publication of some results against some others
is worrying because it can lead to enhanced bias in meta-analysis and hence to a distorted picture of the evidence for or
against a certain hypothesis. Long-term monitoring of the file-drawer problem is needed to ensure a sustainable and reliable
production of (peer-reviewed) scientific knowledge.